I'm running neo4j 3.5.14 from the container at Docker
I'm importing around 200k relationships. The first 100k are great, importing at 25k-30k relationships/second. After the first 100k, there are a couple periods of slowdown, sometimes getting as slow as 50-80 relationships/second for a batch.
I'm importing in batches of 5,000 relationships at a time. My cypher is fairly simple:
UNWIND {batch} AS row
MATCH (source) WHERE id(source) = row.src_neo_id
MATCH (dest) WHERE id(dest) = row.dest_neo_id
MERGE (source)-[r:TALKS_TO{protocol:row.protocol, port:row.port}]->(dest)
Does anyone know what might be causing these slowdowns? My first thought was java GC, so I enabled GC logging, but there were no GCs logged during my import.
What are your heap and page cache settings? As the data gets big, you will benefit from a bigger page cache. If you don't set these ahead of time in your docker settings, part of the slow down you'll see is the docker container gradually growing and re-allocating the heap. I'd pick whatever heap/page cache you're comfortable with, and set initial and max heap to be the same and don't take the defaults.
What is the node density? Say if the src has 5 relationships of this type then the MERGE can be done quickly as it needs to check only 5 relationships. If the node density goes over 1000 then DB is doing lot of work. As you keep adding more relationships to the same node, the time to add extra relationship to the same source node starts going up. That could explain why occasionally you see slow down. You could be hitting few of those dense nodes during that time.
Thanks for the reply. I think node density may be the culprit. I just checked and my most dense node has over 14,000 outgoing relationships. The second highest was around 10,000.
Is there any way to index the relationships on the nodes so that operations like this can be sped up? Or is this just the speed it goes?