The Hyperlink Constituency project at MIT Linguistics investigates the syntactic notion of “constituency” using a unique new methodology of studying inline hyperlinks in a hypertext corpus, with the hypothesis that inline links are constituents in their host sentences.

The project aims to verify (or refine) this hypothesis by compiling a link-rich hypertext corpus from link blogs and other sources and using a  stochastic parser and other tools to see whether the links are indeed  constituents or not. Such a hypertext corpus with link and parse  annotations can also be used to probe further questions about what kinds  of syntactic and semantic structures may be good or bad candidates for  links, as well as to study interesting cases of non-constituent  hyperlinks.

Feel free to read our recent development notes or check out our code.

This project accepts UROPs. Non-MIT contributors are also welcome. Feel free to get in touch.