Multiple Node Merge Speed Question

Keith_gamble · March 20, 2021, 1:20am

I am ingesting around 2,000 nodes with 0 relationships and a couple of properties. I can do it pretty easily with one of the following queries:

UNWIND $batch as row MERGE (n:Equipment {tagPath: row.tagPath}) SET n += row.props

UNWIND $batch as row CALL apoc.merge.node(row.labels, row.idProps, row.props) YIELD node RETURN COUNT(*)

In the first query, I am hardcoding the Equipment label and so I know that is what is adding to speed, as opposed to the second query I would provide multiple varying labels.

I would prefer the second route where I can have dynamic labels, but a 10x speed difference may make it infeasible.

I am curious if there is some obvious reason that I am missing as to why its so much slower, or if I could make some tweaks to help the second option be faster.

Thanks for any advice,
Keith G.

cobra · March 20, 2021, 8:29am

Hello @Keith_gamble

Did you create a unique constraint before loading?

Regards,
Cobra

Keith_gamble · March 20, 2021, 4:20pm

I actually had the same index before I ran both queries, and cleared the database out between each one.

CREATE INDEX tagPath FOR (n:Equipment) ON (n.tagPath)

cobra · March 20, 2021, 10:09pm

Could you use PROFILE in front of both queries to compare the plan?

Keith_gamble · March 22, 2021, 5:06pm

Weird, I did a bit more testing and it looks like the speed difference is actually coming from something in the Java Driver.

It seems that the query is more efficient using APOC though, but takes about twice as long with the driver. Will have to do more digging!

sam_gijare · March 27, 2021, 12:45pm

Which version of Neo4j are you using to run this query on?

Keith_gamble · March 27, 2021, 3:11pm

I am using the latest version of the Java driver and neo4j v4.0

tard_gabriel · March 27, 2021, 5:29pm

Hello @Keith_gamble

I would be curious to see what happen if you change:
SET n += row.props
BY
ON CREATE SET n += row.props

Which would fit exactly your apoc statement, still APOC does a little more I guess with the dynamic labelling, might be the cause.

sam_gijare · March 27, 2021, 7:17pm

There is a list of latest version of java drivers released. Please match the appropriate one and upgrade it. This will probable resolve the issue related to speed.

Keith_gamble · March 29, 2021, 3:57pm

I am using the java driver v 4.2.3 and the database is 4.2.3 as well

Keith_gamble · March 29, 2021, 5:36pm

This didnt actually change anything speed wise.

What is interesting is that the speed isn't faster from a query standpoint, the transaction happens in the same amount of time. Its just slower through the driver to create and close it.

Topic		Replies	Views
Speeding up apoc.refactor.mergeNodes query Cypher apoc , performance , cypher , relationship	1	216	April 28, 2023
Merge Nodes using APOC is slow Procedures & APOC apoc , performance , cypher	4	1291	August 27, 2020
Apoc.merge.node VS merge query Procedures & APOC apoc , performance , cypher , knowledge-base	1	164	January 20, 2024
Performance Issues Merging Nodes Cypher apoc , performance , cypher	3	323	March 13, 2022
Merging nodes on multiple fields is very slow Cypher apoc , performance	8	369	March 9, 2023

Submit Your Talk by June 15

Multiple Node Merge Speed Question

Related topics