The Codex is a digital humanities project that deeply integrates text and data. It includes a new kind of text editor that supports freely overlapping annotations (and more) which are converted into entities in the Neo4j meta-model database.
Google's NLP API is integrated to generate Parts of Speech token annotations along with sentiment analysis. The same API is used for recognition of named entities and pronouns in the text editor.
As a proof of concept, I have entered and annotated 209 of Michelangelo's letters and 799 diary entries of his contemporary, Luca Landucci.
I have published a paper on it through the Zeitschrift für digitale Geisteswissenschaften, called "The Codex - An Atlas of Relations". http://www.zfdg.de/sb004_008
I also recorded some videos on YouTube about it:
It will soon be hosted online through the Digital Academy of Mainz for those who would like to try it out.
If you have questions or would like to collaborate, I am usually on Twitter: https://twitter.com/codexeditor
- 427,708 nodes + 660,859 edges using @neo4j
- 378,275 #NLP annotations
- 34,817 manual annotations
- 21,240 lines of C#
- 4,342 agents
- 799 #Landucci diary entries + 265 footnotes
- 209 #Michelangelo letters + 67 footnotes