Slowdown while merging 200k relationships

I'm running neo4j 3.5.14 from the container at Docker

I'm importing around 200k relationships. The first 100k are great, importing at 25k-30k relationships/second. After the first 100k, there are a couple periods of slowdown, sometimes getting as slow as 50-80 relationships/second for a batch.

I'm importing in batches of 5,000 relationships at a time. My cypher is fairly simple:

      UNWIND {batch} AS row
      MATCH (source) WHERE id(source) = row.src_neo_id
      MATCH (dest) WHERE id(dest) = row.dest_neo_id
      MERGE (source)-[r:TALKS_TO{protocol:row.protocol, port:row.port}]->(dest)

Does anyone know what might be causing these slowdowns? My first thought was java GC, so I enabled GC logging, but there were no GCs logged during my import.

What are your heap and page cache settings? As the data gets big, you will benefit from a bigger page cache. If you don't set these ahead of time in your docker settings, part of the slow down you'll see is the docker container gradually growing and re-allocating the heap. I'd pick whatever heap/page cache you're comfortable with, and set initial and max heap to be the same and don't take the defaults.

I'm currently using:

dbms.memory.pagecache.size=512m
dbms.memory.heap.max_size=1g
dbms.memory.heap.initial_size=1g

I don't remember where I dug those up. JVM tuning is not a skillset I posses, so those numbers are just guesses on my part.

What is the node density? Say if the src has 5 relationships of this type then the MERGE can be done quickly as it needs to check only 5 relationships. If the node density goes over 1000 then DB is doing lot of work. As you keep adding more relationships to the same node, the time to add extra relationship to the same source node starts going up. That could explain why occasionally you see slow down. You could be hitting few of those dense nodes during that time.

1 Like

Thanks for the reply. I think node density may be the culprit. I just checked and my most dense node has over 14,000 outgoing relationships. The second highest was around 10,000.

Is there any way to index the relationships on the nodes so that operations like this can be sped up? Or is this just the speed it goes?

Sadly there is no way to index the relationship properties.

One of the option is to use hyper edge pattern.

You could create TALKS_TO as node.

It can look like this.

(source)-->(talks_to)-->(dest)

You might have to add more properties to that node to make it distinct.

Thanks
Ravi