The data are showed below (each row is the information about buyer)
Remark : The buyer do not repeat in each row but the seller do, and the seller may not from one of the buyer.
I want to create the node and edge as the following
i.e., the relation between buyer and seller. The following is the step and code
Step (1) Create buyer node (named as user)
Create (user:User {id:data.id, name:data.buyer, buyer = TRUE})
Step (2) Create seller node (named as user)
Create (user:User {id:data.id, name:data.seller, seller = TRUE})
Step (3) Link the buyer and seller based on id
Match(n:User)
Match(p:User)
Where n.id = p.id AND n.buyer IS NOT NULL and p.seller IS NOT NULL
Step (4) Merge the duplicate node (i.e., merge the node where name is the same)
MATCH (n:User)
WITH n.name AS repeatname, collect(n) AS nodes
WHERE size(nodes) > 1
CALL{
WITH nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:'combine'})
YIELD node
}
Then I can get the figure which I want successfully. However, I suffer from performance problem, since the data size is not just 5 but about 5 billion. The performance problem happen in step (4), i.e., apoc.refactor.mergeNodes. It take too much time (more than one day)
Is there any other method (for example, do not need apoc.refactor.mergeNodes) that not only can spent less time (more performance) but also get the result of figure? I had tried the periodic method which make parallel compute but it still too much time.
Thanks a lot!