The Hyperlink Constituency project at MIT Linguistics investigates the syntactic notion of “constituency” using a unique new methodology of studying inline hyperlinks in a hypertext corpus, with the hypothesis that inline links are constituents in their host sentences.
The project aims to verify (or refine) this hypothesis by compiling a link-rich hypertext corpus from link blogs and other sources and using a stochastic parser and other tools to see whether the links are indeed constituents or not. Such a hypertext corpus with link and parse annotations can also be used to probe further questions about what kinds of syntactic and semantic structures may be good or bad candidates for links, as well as to study interesting cases of non-constituent hyperlinks.
Feel free to read our recent development notes or check out our code.
This project accepts UROPs. Non-MIT contributors are also welcome. Feel free to get in touch.