expressions | <a>constituency</a>

functions.php is mostly complete. The constituency testing with node labeling, missing node determination, and punctuation-secondary passes, are working quite well.

I copied the database back to my own to run tests to ensure the code worked. The results so far are quite satisfactory. Looking through logs of stderr with stdout, I noticed that some of the unknown errors were in fact due to inline HTML in the hyperlink text, including tags such as “<i>” (and of course, “</i>”). I now invoke stripTags() on the link text before generating the regexp pattern, though my stripTags() is a very simple preg_replace() with a simple regexp.

In hindsight, it’ll miss self-closing tags like “<br />”, but somehow, I doubt people will be using HTML much (especially the line break element, since a simple keyboard return will have the same effect, and links tend to only span one line anyways) in Mefi entries. However, it’ll also have some false positives, though I doubt anyone would ever type in a string like “< and >”.

Here are some preliminary stats from my test run on my own database compared to the unaltered constituency database:

anguyen+linguistics +-----------------------+---------------------+ | constituency | COUNT(constituency) | +-----------------------+---------------------+ | constituent | 18219 | | error | 2644 | | multiple_constituents | 5023 | | not_constituent | 5295 | +-----------------------+---------------------+

constituency+hyperlinks +-----------------------+---------------------+ | constituency | count(constituency) | +-----------------------+---------------------+ | constituent | 18163 | | error | 2647 | | multiple_constituents | 5014 | | not_constituent | 5357 | +-----------------------+---------------------+

constituent+56 error-3 multiple_constituents+9 not_constituent-62

Note, this is without the HTML stripping, so we can expect to have even fewer errors, in subsequent runs. Other errors included “)” and “/” in the PHP warnings when it parsed the link patterns, but I have no clue where they came from. I’ll check it later.

<a>constituency</a>

A research project at MIT Linguistics

Tag Archives: expressions

Polishing Here and There