Multiple Node Merge Speed Question

I am ingesting around 2,000 nodes with 0 relationships and a couple of properties. I can do it pretty easily with one of the following queries:

UNWIND $batch as row MERGE (n:Equipment {tagPath: row.tagPath}) SET n += row.props

UNWIND $batch as row CALL apoc.merge.node(row.labels, row.idProps, row.props) YIELD node RETURN COUNT(*)

In the first query, I am hardcoding the Equipment label and so I know that is what is adding to speed, as opposed to the second query I would provide multiple varying labels.

I would prefer the second route where I can have dynamic labels, but a 10x speed difference may make it infeasible.

I am curious if there is some obvious reason that I am missing as to why its so much slower, or if I could make some tweaks to help the second option be faster.

Thanks for any advice,
Keith G.

Hello @Keith_gamble :slight_smile:

Did you create a unique constraint before loading?

Regards,
Cobra

I actually had the same index before I ran both queries, and cleared the database out between each one.

CREATE INDEX tagPath FOR (n:Equipment) ON (n.tagPath)

Could you use PROFILE in front of both queries to compare the plan?

Weird, I did a bit more testing and it looks like the speed difference is actually coming from something in the Java Driver.

It seems that the query is more efficient using APOC though, but takes about twice as long with the driver. Will have to do more digging!

Which version of Neo4j are you using to run this query on?

I am using the latest version of the Java driver and neo4j v4.0

Hello @Keith_gamble

I would be curious to see what happen if you change:
SET n += row.props
BY
ON CREATE SET n += row.props

Which would fit exactly your apoc statement, still APOC does a little more I guess with the dynamic labelling, might be the cause.

There is a list of latest version of java drivers released. Please match the appropriate one and upgrade it. This will probable resolve the issue related to speed.

I am using the java driver v 4.2.3 and the database is 4.2.3 as well

This didnt actually change anything speed wise.

What is interesting is that the speed isn't faster from a query standpoint, the transaction happens in the same amount of time. Its just slower through the driver to create and close it.