cancel
Showing results for 
Search instead for 
Did you mean: 

Merging on relationships slows down as super-nodes have greater degrees

Hi everyone
I am migrating my data from mongo to my neo4j server. In my data, I have users with unique id's and the id's of their following/follower users.

I iterate through the users from mongo and first merge them into neo4j: (please don't mind curly brackets and quote marks, those work with my python driver)

MERGE (a: Author {{user_id: {user_id}}})

Then I merge all the followers and followings:

UNWIND {linked_people} AS friend_id
MERGE (a: Author {{user_id: friend_id}})

Finally I merge the relationships:

UNWIND {edges} AS edge
MATCH (a:Author), (b:Author)
WHERE a.user_id = edge[0] AND b.user_id = edge[-1]
MERGE (a)-[r:FOLLOWS]->(b)

The migration process becomes slower and slower over the time. I have gone through 5.4 million users and there are 1.4 million to go. I am afraid, the last half a million or so will be very painful to wait.

Probably, some nodes accumulate more and more followers, thus when I try to add more relationships on them, merging takes a longer time.
A similar issue had been pointed out here: Slowdown while merging 200k relationships however the answer from @anthapu was from February '20.
I wonder, if there has been an optimization regarding that problem over the last two years. Or do I have to live with that?

Neo4j:

"Neo4j Kernel" "4.4.5" "enterprise"

Constraints:

Indices:

Regards
Ahmet

1 ACCEPTED SOLUTION

Great! Then I'd try to move the intial_size to 6G and max_size to 7G, and see how it affects your process. I think you can go all the way to 10G, but it would depend on how much is used by the OS. also, you have to restart the service if you're in Windows, or restart through systemctl/script execution if you're in linux.

Hope this helps!

View solution in original post

3 REPLIES 3

luiseduardo
Ninja
Ninja

Welcome @ahmet.gurhan !

Which are your heap configurations? You can see here some information on these configs.

Hi @luiseduardo

I did not specify heap configurations manually. My neo4j.conf file is as follows:

...
#Java Heap Size: by default the Java heap size is dynamically calculated based
#on available system resources. Uncomment these lines to set specific initial
#and maximum heap size.
#dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=512m
...

My standalone Neo4j runs on a single machine that is not utilized by other applications.
Resource requests of my Neo4j pod on kubernetes are as follows:

resources:
limits:
cpu: 3900m
memory: 13200Mi
requests:
cpu: 3800m
memory: 13000Mi

Thanks for your response.

Great! Then I'd try to move the intial_size to 6G and max_size to 7G, and see how it affects your process. I think you can go all the way to 10G, but it would depend on how much is used by the OS. also, you have to restart the service if you're in Windows, or restart through systemctl/script execution if you're in linux.

Hope this helps!