Hey Everyone,
For a proof of concept i'm struggling with a Cypher query to get a property on an existing node, and create a new node with a relationship to the node it's coming from.
I've tried the following:
CALL apoc.periodic.iterate("
MATCH (transaction:Transaction)
RETURN transaction
","
MERGE (transaction)-[r:HAS_PAYMENT_METHOD]->(f:PaymentMethod {paymentMethod: transaction.paymentMethod})
",{batchSize:20000, parallel:false})
This script does do what i want, except that it creates a new node for values already existing.
So in this case it would get the paymentMethod property, create a new node with the value. But if the same value occurred again it would still create a new node. Does anyone have experience with this?
Neo4J version 4.3.3 (Enterprise/desktop)
Apoc version: 4.3.0.0
I've messed with the query just a bit. I've had some success with the following query but it's incredibly slow on my dataset of 2.000.000 nodes.
CALL apoc.periodic.iterate("
MATCH (transaction:Transaction)
RETURN transaction
","
MERGE (s:PaymentMethod {paymentMethod: transaction.paymentMethod})
WITH s, transaction
MERGE (s)<-[:HAS_PAYMENT_METHOD]-(transaction)
",{batchSize:20000, parallel:false})
Is there a way to optimize this further?
@kevin.oosterhout
What version of Neo4j? version of APOC?
Is there an index on :PaymentMethod(paymentMehod) ?
Updated the topic with the versions.
There's currently no index on it
@kevin.oosterhout
MERGE - Cypher Manual states
For performance reasons, creating a schema index on the label or property is highly
recommended when using MERGE. See Indexes for search performance for
more information.
why ??? because a MERGE
is effectively a create or update. and so if you have no index and if you have 100k nodes named :PaymentMethod
then every MERGE
is going to examine all 100k nodes with this label to see if the node already exists or not. Now if you index on :PaymenMethod(paymentMehod) then it thus only searches the index and presumably it find a much much smaller set
As such, please create an index on said label/property and rerun your test
1 Like
Thanks for the help. The index made a huge difference. Before i let it run for an hour to get 70.000 nodes. Now it did about 2.000.000 in under a minute.
1 Like