neo4j specifications:
dbms.memory.heap.initial_size=24G
dbms.memory.heap.max_size=24G
dbms.memory.pagecache.size=8G
neo4j version: Community 4.2.0
desktop version: 1.3.11
Hello,
How can I improve on my data model to reduce the number of relationships?
Currently, my graph database contains a total of 300,000 nodes connected by about 550,000,000 relationships. After debating the graph data model with my colleagues for several weeks and performing numerous refactorings, I can't figure out a way to diminish the number of relationships between a subset of nodes on the part of my knowledge graph that is illustrated below.
I'm hoping that by reducing the number of relationships I can speed up the performance of my cypher queries which are beginning to take more than hour to finish as they become more complex.
Here's a proxy example of my current use case:
In this example, here are the counts for each of the nodes and relationships:
Nodes
Person: 100,000 nodes with 5 properties
A_Law: 5,000 nodes with 4 properties
B_Law: 3,000 nodes with 4 properties
C_Law: 2,000 nodes with 4 properties
Business: 7,000 nodes with 3 properties
Relationships
A_Access: 500,000,000 relationships (every person node is connected to every single unique A_Law node. Therefore, we have 100,000 times 5,000 is 500,000,000) with 0 properties
B_Access: 2,000 relationships with 0 properties
C_Access: 3,000 relationships with 0 properties
A_Rel: 5,000 relationships with 2 properties
B_Rel: 3,000 relationships with 2 properties
C_Re;: 2,000 relationships with 2 properties
As you can see above, the problem here is with the number of the A_Access relationship that has exploded as a result of the product between all of the person nodes to all of the A_Law nodes. Our business domain requires that each of the unique person nodes must have one relationship to each of the unique A_Law nodes. This results in 500,000,000 relationships and this is causing our cypher queries to take more than an hour to finish.
Another approach that I'll implement here is to add more indices to the properties of the nodes or relationships, but I would like to be careful when adding various indices. Here's a warning I read about adding too many indices: Welcome to the Dark Side: Neo4j Worst Practices (& How to Avoid Them)
I've tried profiling and explaining our queries and I see that there are billions of rows being accessed.
I also read and watched the posts and videos listed below multiple times, but I still don't know how else to improve on the data model:
I also read the entire O'Reilly book on Graph Databases, but I still don't know how to improve my data model.
Thanks for any ideas or suggestions.
-Tony