Statistics

After rerunning the “missing_tree” links, I tallied the links up:

mysql> SELECT stanford, COUNT(stanford) FROM `hyperlinks_links` GROUP BY stanford;
+----------------------------------+-----------------+
| stanford                         | COUNT(stanford) |
+----------------------------------+-----------------+
| almost_constituent:-LRB-         |               1 |
| almost_constituent:ADJP          |              17 |
| almost_constituent:ADVP          |              14 |
| almost_constituent:CC            |               1 |
| almost_constituent:CD            |              41 |
| almost_constituent:FRAG          |               5 |
| almost_constituent:JJ            |              57 |
| almost_constituent:NN            |              58 |
| almost_constituent:NNP           |              41 |
| almost_constituent:NNPS          |               6 |
| almost_constituent:NNS           |              43 |
| almost_constituent:NP            |             283 |
| almost_constituent:PP            |               8 |
| almost_constituent:PRT           |               1 |
| almost_constituent:QP            |               3 |
| almost_constituent:RB            |               3 |
| almost_constituent:ROOT          |              70 |
| almost_constituent:S             |              15 |
| almost_constituent:SBAR          |              16 |
| almost_constituent:SBARQ         |               1 |
| almost_constituent:VBN           |               1 |
| almost_constituent:VP            |              31 |
| almost_constituent:X             |               4 |
| almost_constituent_2ndpass_:NN   |               1 |
| almost_constituent_2ndpass_:NNP  |               1 |
| almost_constituent_2ndpass_:NP   |               5 |
| almost_constituent_2ndpass_:PRP$ |               1 |
| constituent                      |           17435 |
| missing_tree                     |            1338 |
| multiple_constituents            |            5014 |
| not_constituent                  |            5056 |
| unknown_error                    |            1309 |
| xclausal                         |             301 |
+----------------------------------+-----------------+

Variable Value
Almost constituents 728
Constituents 17435
Multiple constituents 5014
Non-constituents 5056
Cross-clausal links 301
Links with missing trees 1338
Links with unknown errors 1309

Now, we tally up further and say that intended and actual constituents are the sum of almost-constituent, constituent, and multiple constituent links. This sum is 23177.

For ultimately non-constituents, this is the sum of the non-constituents and cross-clausal links: 5357.

The grand total of correctly parsed links is 28534.

So intended/actual constituents counted for 81.23% of the correctly parsed links, while non-constituents counted for 18.77% of the correctly parsed links.

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>