I just pushed sentence-splitting code up to the repository. parse_entries.php now splits sentences before feeding them to the parser, which makes a lot more sense, as the parser did not handle multi-sentence paragraphs well. We’re using Adwait Ratnaparkhi’s MXTERMINATOR sentence-splitter.
We’ve also retrained the parser and reparsed the whole database, using David Vadas’ NP structure additions to the Penn Treebank. The two have increased the constituency percentage by about +6%, which is slightly less than I expected.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.