Hi,
I'm kind of getting into neo4j, I saw that this tool is very scalable and I guess its not so clear how to create an efficient query for what I need.
What i'm trying to do is build a full binary tree. For that I have a .csv format as follows:
Such that the first column are parents of the nodes in the second column.
At first, I tried creating the nodes+relationships at the same query but got a warning that its an Eager operator.
Then, I tried to settle for the next two queries:
LOAD CSV FROM 'file:///neonodesdistinct.csv' AS line
CREATE (:f {name: line[0]});
Where neonodesdistinct.csv has all the distinct node names seperated by lines.
And then I ran:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///neotree.csv' AS line
MATCH (n1:f {name: line[0]}), (n2:f {name: line[1]})
MERGE (n1)-[:Parent_off]->(n2);
Where neotree.csv has all the relationship data described at the top.
I started with a tree consisting of 8191 nodes.
The first query is very fast (adding nodes only), but the second one takes up to 2 minutes.
I'm running the latest neo4j-client version on a linux VM that has 49152 MB of RAM and the graph.db data was moved to an SSD with a lot of volume in it.
Here is the screenshot of EXPLAIN for second query:
These are file samples for my example (8191 nodes and relationships). Sorry for the .txt (couldn't upload .csv). neonodesdistinct.txt (123.3 KB) neotree.txt (236.0 KB)
You should also be aware that having a binary tree structure in a graph database may not be of much use with Cypher alone.
Neo4j MATCHes use dfs by default (only shortestPath() matches use bfs, but you need to pre-match the start and end nodes), and there's nothing in Cypher which allows triggering of a binary search on a segment of the graph, or insertion/deletion to a binary tree structure. That's best handled in actual code, not in the database.
If you could explain the need for a binary tree structure, maybe we can help you figure out an alternate means of meeting your goals.
My final goal is to describe a file system via a graph.
Neo4j is one direction that i'm checking.
I know that in reality the file system resembles something more of a forest data structure, but I wanted to have some similar graph in both query languages i'm checking, so I can compare benchmarks (other one is document store - MongoDB).
The target DB (to describe the file system) will consist of about ~1 billion nodes with every node having some file metadata.
Good morning,
Neo4j supports creating indexes. You could try this to speed up the second import proces. The syntaxis = CREATE INDEX ON :label(propertyname);
In your case with the first import you make all the Nodes which has no cost (linear).
But the second import has to search through the existing Node property name to find the node to make the relation with.
Neo4j is a Label Property Graph example: (x:Person{name: "Carol", age:"23", etc....}) "Person" is the label and "name" and "age" are properties and your free to create properties and create indexes on them. But creating a index has a price.
I think you should create indexes when needed example when you have search for some property value.
Indexes introduce an extra write time cost (when data is added or changed, any indexes that rely on that data must be updated in the transaction). If we indexed all properties of all nodes for all labels by default, we would be adding an excessive write time cost that would grow with the number of properties per node, and with the number of labels per node.
Also, indexing on every field is excessive, as not every field will be used for initial lookup of the nodes. The general rule is, if you intend to do initial lookups of nodes of certain labels by certain properties, add indexes for just those properties.