Hi everyone,
I'm new to Neo4j, but I have read the O'Reilly book and some of the Cypher manual. I am familiar with Graphs in general (from a mathematical perspective). I am getting confused with the basics of traversal performance in Neo4j.
Context
I am making a Knowledge Management system, consisting of Entities with Key/Value Properties. These Values are often going to be other Entities in the system.
For example, the user can store the Simpsons children, with Bart['sister'] = Lisa
, and Maggie['brother'] = Bart
etc;
@Bart
sister:@Lisa
sister:@Maggie
@Lisa
brother:@Bart
sister:@Maggie
@Maggie
brother:@Bart
sister:@Lisa
I need the Keys to be able arranged into a flexible, user-defined hierarchy.
So they can also say that sister
and brother
are sub-types of sibling
.
Then, if they ask for Bart['sibling']
they should get [Maggie, Lisa]
, and Lisa['sibling']
should get [Maggie, Bart]
.
I also want to be able to get all Entities with specific Properties, so HAS('brother')
would get [Maggie, Lisa]
, and HAS('sibling')
would get them all (even though none directly has a sibling
property).
My model
I am modelling this in Neo4j with nodes like (b:Entity {name:'Bart'})
and (s:Property {name:'sister'})
. The inheritance is modeled as (p:Property {name:'brother'})-[:IS_A]->(p:Property {name:'sibling'})
.
To model a property having a particular value, I introduce a Context
node which connects up an Entity, Key, and Value;
(sister)-[:VALUE]->(context)-[:VALUE]->(maggie)
(context)-[:CONTEXT]->(lisa)
JSON export of this if I'm not being clear enough: records.txt (1.0 KB)
Performance
This encodes enough information to allow all the queries I want to be able to do, but I am worried/confused about the performance. In particular, basic "JSON-style" employee['manager']['manager']['personal assistant']['phone number']
I feel like should be O(1) or close to it, if the keys are unique. This is how it would be if the graph was built with maps/dicts.
The docs all seem to say that this type of pointer chasing is fast, but the actual implementation seems to be that a Node stores it's Relationships in a linked list; so every time you follow a Relationship you actually have to loop through every Relationship until you find the one you are looking for? I feel like that's going to be a problem here because things like (n:Property {name:'DOB'}
are going to have loads of connections. Is there any kind of optimisation I can use here to allow for fast "JSON-style" chaining of Keys?
I feel like in my system, if I have the Entity node (fast - indexed on name), and the Property node (fast - indexed on name), then I should be able to get it's value without looping...? Am I misunderstanding how traversal works?
General tips or pointers to helpful material appreciated! Thanks.