I am facing an issues with my local neo4j database. I am trying to insert around 100.000 nodes and 380.000 relationships using the official C# neo4j driver. Most of the time the data import takes around 23 seconds (which is fine for me). However, from time to time (~every 3 time!) the import gets stuck. This means that I send a request to create new Relations (via http, see Wireshark screenshot below) but I don’t get a response back from the database (after some time i run into a timeout at my application). This always happens when I try to insert new relationships. Creating new nodes seems to work fine. The strange thing is, that sometimes the import finishes successfully without getting stuck in ~23 seconds!
This is the query to create new nodes which doesn't get stuck:
string query = "UNWIND {nodes} AS node " +
$"CREATE (n:label1) " +
"SET n = node";
return ExecuteQuery(client, query, new Dictionary<string, object> { { "nodes", ParameterSerializer.ToDictionary(elements) } });
I always process 1000 nodes and relationships at once. I fiddled around with different heap and pagecache sizes but I doesn't seem to have any effect on my problem. If you look at the second screenshot it looks like the memory size is sufficient anyways. I also tried using the Bolt protocol instead of http. The problem still persists.
Do you have any explanations what is causing this behavior? Or do you have some ideas what I can try to do in order to circumvent this problem?
Thank you very much for your help,
kind regards,
Daniel
Additional Infos:
neo4j is running on a Window 10 VM with 4 CPU Cores assigned and 6GB of RAM
I have some News. After some changes in my code the database doesn't stuck anymore. It simply takes a little longer to process all the data.
However, what strange is, is that the first batch I insert take a whole lot longer then the rest. The first time I execute the following query I takes much longer then the following ones.
I first create all nodes (around 100.000 in batches of 1000) and after that I create all relationships (around 300.000 in batches of 1000). Node creation seems to run fine. It seems that the first query creating the first 1000 relationships (see example in my other post) takes three times as long as the following queries.
Do you create the indexes upfront?
Do you wait for them to become online?
Can you run an EXPLAIN of your relationship-statemetn after your created the nodes?
It could be that it helps to call db.resampleOutdatedIndexes(); after you created the nodes
So that the first query plan it creates is already correct.
No. I create the indexes up front when the DB is empty. Then I start node & relationship creation directly afterwards. I actually don't know how to check if the index is online. Calling db.resampleOutdatedIndexes(); doesn't seems to help either.
Can you run an EXPLAIN of your relationship-statemetn after your created the nodes?
This is the EXPLAIN of the query that takes extra long:
I don't really get your point. I use parameters in my .NET code. But in the end these parameters are getting resolved and serialized to a JSON string to form the query.
Can you be a bit more specific on what I actually should do?