cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Expect disruption to service on Thursday, Feb. 9!

Is it efficient that creating multiple nodes, create corresponding edges and the merge nodes for paralelize very intensive process ? (To avoid deadlocks)

ugurtosun
Node Link

Hello everyone,

I am working on huge graph over 200M nodes and 500M edges. I will add a new node, and approximately 100M nodes will be connected with new node. I am working on with neo4j community edition. It takes too long time. I cannot paralelize because of deadlock exceptions. I got an idea that create mirror nodes such as newnode1, newnode2, newnode3, newnode4, newnode5 ... Create edges in paralel way such as for batch1 -> newnode1, batch2 ->newnode2, batch3 -> newnode3... Then use apoc.refactor.mergeNodes method for merging temporary new nodes into final new node. Is it logical ? What are the pros and cons ?

Thanks.

1 REPLY 1

david_allen
Neo4j
Neo4j

Sorry to disappoint but I would recommend you take a different approach. Yes it's going to be hard to write a single node with 100M nodes connected to it. If you succeed, you're going to have a different problem after that, because you will have created a supernode.

I would recommend not doing what you are trying to do. It is likely you need to choose a different data model that doesn't require a single node attached to 100M other things. In other words, I think the import problem you're running into and the query problems you would have afterwards are symptoms of a needed model change.

For much more information, see this article: