I have a MERGE query (on relationship) of the below form, and about 2000 queries are invoked around the same time, its taking ~5 minutes to complete all of them.
How can I refactor the query or application logic so that this can be drastically faster.
Can you provide the code where this request is used?
You might use the EXPLAIN and PROFILE clause before your MERGE clauses to see what is happening there?
I wonder if we have a cartesian product here between each MERGE, I have to double check that.
I don't thing MERGE does cartesian product, must be in your code.
Not sure what you mean by Cartesian product in my code.
I dont think it matters what the data actually is, we have ~2000 relationships that needs to be inserted by processing an input file at once, and I feel ~5 minutes is too long.
Yes, and the query structure is very simple, which is confusing me.
The actual query we are running in our code has a lot more properties in the nodes and relationships. But the basic essence is the same.
Any other clues on this please?
The way the session and transactions are coded in you language have a huge influence on the speed of your data creation process.
If, as an example, you create a session for each query in a loop the time will increase drastically.
( Don't do that, I did, it was funny )
Your MERGE must include the strict necessary properties to check if it's already exist only.
The other properties on creation must be set with the ON CREATE subclause.
Matching or merging with the clause MERGE with too much properties inside {} can slow down the process significantly too. Unless using a really big composite index. Do not hesitate to use the EXPLAIN or PROFILE clause. It's the neo4j magic debugger.
I am reusing the session. The same query is invoked concurrently in batches of 50 at a time, until all the 2000 requests are completed.
I did try the PROFILE on my query, and it seems to show (786,579 db hits) and (455,803 db hits) for category and items. I believe the existing unique constraint should be sufficient.
However, I will make changes to include the ON CREATE subclause you pointed you! Not sure if it will make a big change in execution time though.
Hi,
Why are you saying "If, as an example, you create a session for each query in a loop the time will increase drastically." ? (see this link : Should the session be closed each query? · Issue #384 · neo4j/neo4j-javascript-driver · GitHub where neo4j is saying that is it not a bad practice.)
That's exactly what i am doing lol. Got many microservices pushing data, and everytime a microservices found new data, he's opening new session, making his transaction thought it and then closing it.
You think it's a better idea to create one session peer microservices, and then make all transaction thought the same session, even if the session is alive for many days ?
Thanks,
Gautier
It's not what I mean, I didn't want to put you in the wrong way.
Yes indeed the session are extrêmely lightweight and disposable, but I only mean that often your code or the way you code your transactions / query / sessions in the language for your choisce might have a huge impact on the performance.
It's not necessary your query, but in this case yes your query and some catesian products in it which can be deadly for huge dataset.