I have millions of data rows I need to do this for, but I can't just do this with a single UNWIND query because neo4j crashes with a memory error. If I make batches so that $data contains around 20,000 rows at a time, this seems to be OK. But is there a way to increase this? Are there any tricks for dealing with these kinds of situations?
could be many more efficient solutions.
This scenario is like migrating data.
Here one can think in simple way.
At the source node level maintain a attribute as migrated =false
And at very first stage select only those record where migrated is false and LIMIT=20000
then MERGE it.
I don't really understand what you're suggesting. Are you telling me to try to avoid merging any nodes that are already present in the graph?
I wonder if there is a some other underlying problem here, since it's taking 20 minutes to merge around 100,000 nodes. Each node has around 10 attributes, one of which is name_id, for which I have set
CREATE CONSTRAINT ON (n:Node)
ASSERT n.name_id is unique
I see posts about people merging millions of nodes in a few minutes, so I am wondering what I'm doing wrong.