Extracting property of node to a new node

Hey Everyone,

For a proof of concept i'm struggling with a Cypher query to get a property on an existing node, and create a new node with a relationship to the node it's coming from.

I've tried the following:

CALL apoc.periodic.iterate("
    MATCH (transaction:Transaction)
    RETURN transaction
","
    MERGE (transaction)-[r:HAS_PAYMENT_METHOD]->(f:PaymentMethod {paymentMethod: transaction.paymentMethod})
",{batchSize:20000, parallel:false})

This script does do what i want, except that it creates a new node for values already existing.
So in this case it would get the paymentMethod property, create a new node with the value. But if the same value occurred again it would still create a new node. Does anyone have experience with this?

Neo4J version 4.3.3 (Enterprise/desktop)
Apoc version: 4.3.0.0

I've messed with the query just a bit. I've had some success with the following query but it's incredibly slow on my dataset of 2.000.000 nodes.

CALL apoc.periodic.iterate("
    MATCH (transaction:Transaction)
    RETURN transaction
","
    MERGE (s:PaymentMethod {paymentMethod: transaction.paymentMethod})
    WITH s, transaction
    MERGE (s)<-[:HAS_PAYMENT_METHOD]-(transaction)
",{batchSize:20000, parallel:false})

Is there a way to optimize this further?

@kevin.oosterhout

What version of Neo4j? version of APOC?

Is there an index on :PaymentMethod(paymentMehod) ?

Updated the topic with the versions.

There's currently no index on it

@kevin.oosterhout

MERGE - Cypher Manual states

For performance reasons, creating a schema index on the label or property is highly 
recommended when using MERGE. See Indexes for search performance for 
more information.

why ??? because a MERGE is effectively a create or update. and so if you have no index and if you have 100k nodes named :PaymentMethod then every MERGE is going to examine all 100k nodes with this label to see if the node already exists or not. Now if you index on :PaymenMethod(paymentMehod) then it thus only searches the index and presumably it find a much much smaller set

As such, please create an index on said label/property and rerun your test

1 Like

Thanks for the help. The index made a huge difference. Before i let it run for an hour to get 70.000 nodes. Now it did about 2.000.000 in under a minute.

1 Like