functions.php is mostly complete. The constituency testing with node labeling, missing node determination, and punctuation-secondary passes, are working quite well.
I copied the database back to my own to run tests to ensure the code worked. The results so far are quite satisfactory. Looking through logs of stderr with stdout, I noticed that some of the unknown errors were in fact due to inline HTML in the hyperlink text, including tags such as “<i>” (and of course, “</i>”). I now invoke stripTags() on the link text before generating the regexp pattern, though my stripTags() is a very simple preg_replace() with a simple regexp.
In hindsight, it’ll miss self-closing tags like “<br />”, but somehow, I doubt people will be using HTML much (especially the line break element, since a simple keyboard return will have the same effect, and links tend to only span one line anyways) in Mefi entries. However, it’ll also have some false positives, though I doubt anyone would ever type in a string like “< and >”.
Here are some preliminary stats from my test run on my own database compared to the unaltered constituency database:
anguyen+linguistics
+-----------------------+---------------------+
| constituency | COUNT(constituency) |
+-----------------------+---------------------+
| constituent | 18219 |
| error | 2644 |
| multiple_constituents | 5023 |
| not_constituent | 5295 |
+-----------------------+---------------------+
constituency+hyperlinks
+-----------------------+---------------------+
| constituency | count(constituency) |
+-----------------------+---------------------+
| constituent | 18163 |
| error | 2647 |
| multiple_constituents | 5014 |
| not_constituent | 5357 |
+-----------------------+---------------------+
constituent+56
error-3
multiple_constituents+9
not_constituent-62
Note, this is without the HTML stripping, so we can expect to have even fewer errors, in subsequent runs. Other errors included “)” and “/” in the PHP warnings when it parsed the link patterns, but I have no clue where they came from. I’ll check it later.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.