Slowdown while merging 200k relationships

sean · February 14, 2020, 2:06pm

I'm running neo4j 3.5.14 from the container at Docker

I'm importing around 200k relationships. The first 100k are great, importing at 25k-30k relationships/second. After the first 100k, there are a couple periods of slowdown, sometimes getting as slow as 50-80 relationships/second for a batch.

I'm importing in batches of 5,000 relationships at a time. My cypher is fairly simple:

      UNWIND {batch} AS row
      MATCH (source) WHERE id(source) = row.src_neo_id
      MATCH (dest) WHERE id(dest) = row.dest_neo_id
      MERGE (source)-[r:TALKS_TO{protocol:row.protocol, port:row.port}]->(dest)

Does anyone know what might be causing these slowdowns? My first thought was java GC, so I enabled GC logging, but there were no GCs logged during my import.

david_allen · February 14, 2020, 3:14pm

What are your heap and page cache settings? As the data gets big, you will benefit from a bigger page cache. If you don't set these ahead of time in your docker settings, part of the slow down you'll see is the docker container gradually growing and re-allocating the heap. I'd pick whatever heap/page cache you're comfortable with, and set initial and max heap to be the same and don't take the defaults.

sean · February 14, 2020, 3:51pm

I'm currently using:

dbms.memory.pagecache.size=512m
dbms.memory.heap.max_size=1g
dbms.memory.heap.initial_size=1g

I don't remember where I dug those up. JVM tuning is not a skillset I posses, so those numbers are just guesses on my part.

anthapu · February 15, 2020, 12:49am

What is the node density? Say if the src has 5 relationships of this type then the MERGE can be done quickly as it needs to check only 5 relationships. If the node density goes over 1000 then DB is doing lot of work. As you keep adding more relationships to the same node, the time to add extra relationship to the same source node starts going up. That could explain why occasionally you see slow down. You could be hitting few of those dense nodes during that time.

sean · February 17, 2020, 1:38pm

Thanks for the reply. I think node density may be the culprit. I just checked and my most dense node has over 14,000 outgoing relationships. The second highest was around 10,000.

Is there any way to index the relationships on the nodes so that operations like this can be sped up? Or is this just the speed it goes?

anthapu · February 17, 2020, 1:54pm

Sadly there is no way to index the relationship properties.

One of the option is to use hyper edge pattern.

You could create TALKS_TO as node.

It can look like this.

(source)-->(talks_to)-->(dest)

You might have to add more properties to that node to make it distinct.

Thanks
Ravi

Topic		Replies	Views
Importing Relationships / Nodes very slow Import / Export performance , cypher , import	3	1087	March 5, 2020
Merging on relationships slows down as super-nodes have greater degrees Cypher relationship , merge	3	289	May 9, 2022
Help me merge 170M relationships with LOAD CSV Cypher load-csv	10	3639	October 23, 2019
Performance issue when importing CSV relationships Import / Export performance , import , csv , index	2	2083	January 28, 2019
Loading in millions of nodes Import / Export performance , cypher , import	0	333	February 18, 2022

July Summer Fun!

Slowdown while merging 200k relationships

Related topics