Is it efficient that creating multiple nodes, create corresponding edges and the merge nodes for paralelize very intensive process ? (To avoid deadlocks)

Hello everyone,

I am working on huge graph over 200M nodes and 500M edges. I will add a new node, and approximately 100M nodes will be connected with new node. I am working on with neo4j community edition. It takes too long time. I cannot paralelize because of deadlock exceptions. I got an idea that create mirror nodes such as newnode1, newnode2, newnode3, newnode4, newnode5 ... Create edges in paralel way such as for batch1 -> newnode1, batch2 ->newnode2, batch3 -> newnode3... Then use apoc.refactor.mergeNodes method for merging temporary new nodes into final new node. Is it logical ? What are the pros and cons ?

Thanks.

Sorry to disappoint but I would recommend you take a different approach. Yes it's going to be hard to write a single node with 100M nodes connected to it. If you succeed, you're going to have a different problem after that, because you will have created a supernode.

I would recommend not doing what you are trying to do. It is likely you need to choose a different data model that doesn't require a single node attached to 100M other things. In other words, I think the import problem you're running into and the query problems you would have afterwards are symptoms of a needed model change.

For much more information, see this article:

1 Like

how about situations where model cannot be altered. For example when trying to load map data (open street map data), one cannot remodel the nodes and relations.