I am looking to model a hierarchical document, and the sentences it contains and need to be able to idempotently upsert the document, and its contents.
Consider the document
My Document
Sentence 1
Section 1
Sentence 2
Sentence 3
Section 2
Sentence 4
For any given document, I know a unique document id, and want to store the sentence hierarchy in the graph.
I am considering a schema (using GraphQL library schema syntax) like:
type Document {
id: ID!
title: String
content: [Sentence] @relationship(type: "HAS_SENTENCE", direction: OUT)
}
type Sentence {
text: String!
document: Document! @relationship(type: "SENTENCE_IN", direction: OUT)
parent: Sentence @relationship(type: "PARENT_OF", direction: IN)
previous: Sentence @relationship(type: "FOLLOWS", direction: OUT)
next: Sentence @relationship(type: "FOLLOWS", direction: IN)
children: Sentence @relationship(type: "PARENT_OF", direction: OUT)
}
In this case I'd expect the document to contain three root sentences, sentence 1, section 1, and section 2, with section 1 being parent of sentence 2 and sentence 3, and section 2 being parent of sentence 4.
My motivation is to be able to maintain hierarchical structure of the document when retrieved, and to be able to identify proximal co-occurrences of entities within the document content.
Are there alternate approaches to modelling the structure and content of a document that would be worth considering?
Assuming I already have this structure modeled as a json object, with previous/next semantics captured in the ordering of the `children` array, what would be the best way to go about inserting such a document in cipher?
The best approach I have come up with so far is to flatten the tree structure with a pre-order traversal, generating a list of sentences with generated ids and capturing the relationships with ids, then writing a cypher query which unwinds the list and adds each sentence one at a time building the graph as it goes.
Are there any other approaches to insert such a structure into the graph?