Improving very slow MERGE on relationship

tinqnit · January 7, 2021, 5:23am

I have a MERGE query (on relationship) of the below form, and about 2000 queries are invoked around the same time, its taking ~5 minutes to complete all of them.

How can I refactor the query or application logic so that this can be drastically faster.

MERGE (a: category { name: {name} }) 
MERGE (b: item { hash: {hash}, timestamp: {timestamp} } )
MERGE (b)-[r:PRESENT_IN{count: {count}}]->(a)

Conditions:

unique constraint exists on category.name and item.hash
the PRESENT_IN relationships created should be unique , no duplicates
either protocol or item already exists
both protocol and item already exists
neither exists

koji · January 7, 2021, 8:17am

Hi @tinqnit

Can You add timestamp index to this DB to make it faster?

CREATE INDEX item_timestamp FOR (n:item) ON (n.timestamp)

tard_gabriel · January 7, 2021, 8:26am

Can you provide the code where this request is used?

You might use the EXPLAIN and PROFILE clause before your MERGE clauses to see what is happening there?

I wonder if we have a cartesian product here between each MERGE, I have to double check that.
I don't thing MERGE does cartesian product, must be in your code.

tinqnit · January 7, 2021, 9:02am

I already have a unique constraint placed on item.hash, assuming this implicitly behaves like an index and suffices?

tinqnit · January 7, 2021, 9:04am

Not sure what you mean by Cartesian product in my code.
I dont think it matters what the data actually is, we have ~2000 relationships that needs to be inserted by processing an input file at once, and I feel ~5 minutes is too long.

tard_gabriel · January 7, 2021, 9:07am

Five minutes is huge for a tiny 2000 relations indeed

tinqnit · January 7, 2021, 9:10am

Yes, and the query structure is very simple, which is confusing me.
The actual query we are running in our code has a lot more properties in the nodes and relationships. But the basic essence is the same.
Any other clues on this please?

tard_gabriel · January 7, 2021, 9:22am

The way the session and transactions are coded in you language have a huge influence on the speed of your data creation process.

If, as an example, you create a session for each query in a loop the time will increase drastically.
( Don't do that, I did, it was funny )

Your MERGE must include the strict necessary properties to check if it's already exist only.
The other properties on creation must be set with the ON CREATE subclause.

Matching or merging with the clause MERGE with too much properties inside {} can slow down the process significantly too. Unless using a really big composite index. Do not hesitate to use the EXPLAIN or PROFILE clause. It's the neo4j magic debugger.

tinqnit · January 7, 2021, 9:31am

I am reusing the session. The same query is invoked concurrently in batches of 50 at a time, until all the 2000 requests are completed.

I did try the PROFILE on my query, and it seems to show (786,579 db hits) and (455,803 db hits) for category and items. I believe the existing unique constraint should be sufficient.

However, I will make changes to include the ON CREATE subclause you pointed you! Not sure if it will make a big change in execution time though.

tinqnit · January 7, 2021, 9:49am

This is pretty lame, but just realised on this test machine the constraints were not setup correctly.

Thanks for the help!

gautier.wojda · March 24, 2022, 8:20am

Hi,
Why are you saying "If, as an example, you create a session for each query in a loop the time will increase drastically." ? (see this link : Should the session be closed each query? · Issue #384 · neo4j/neo4j-javascript-driver · GitHub where neo4j is saying that is it not a bad practice.)
That's exactly what i am doing lol. Got many microservices pushing data, and everytime a microservices found new data, he's opening new session, making his transaction thought it and then closing it.
You think it's a better idea to create one session peer microservices, and then make all transaction thought the same session, even if the session is alive for many days ?
Thanks,
Gautier

tard_gabriel · March 24, 2022, 9:37pm

Excuse me gautier

It's not what I mean, I didn't want to put you in the wrong way.
Yes indeed the session are extrêmely lightweight and disposable, but I only mean that often your code or the way you code your transactions / query / sessions in the language for your choisce might have a huge impact on the performance.

It's not necessary your query, but in this case yes your query and some catesian products in it which can be deadly for huge dataset.

Topic		Replies	Views
Merge of few thousand relations is very slow Cypher	4	4351	October 11, 2018
Very slow relationship creation (UNWIND) Cypher performance , cypher , relationship	8	2339	May 29, 2021
MERGE Query takes too much time to execute Cypher	2	112	May 15, 2024
Why isn't it faster for nodes that have been created? Cypher	3	512	November 7, 2020
Very slow merges neo4j Neo4j Graph Platform performance , cypher	12	529	April 4, 2022

Get Certified in June!

Improving very slow MERGE on relationship

Related topics