What is the right way to execute and get results from a query that takes hours to process? Or is just Neo4j not built to do such things? Or is there some hidden timeout limit that silently kills such queries? Or do I need the enterprise edition?
Obviously running the query in browser and waiting for the result is out of the question.
I had some success with wrapping the query into apoc.export.cypher.query() and executing it locally through cypher-shell running in background using '&' on Linux as suggested here Cypher-Shell - how to run testquery.cypher in background. Works for queries taking e.g. 20 minutes but more demanding queries seems to just die silently without leaving any trace in the logs.
There are many things that may cause a query to take a long time to run. Usually, some query optimisation and/or model refactoring can make huge impacts on this.
Are you able to share your queries/talk about what optimisations you have done to them?
Also, how much memory have you configured for the queries to run?
see the very good video:
Not knowing specifics of your problem, here are some random ideas (stabs in the dark and I don't mean to insult your intelligence if you already know about these...):
toFloat()and store them in the DB before doing the query)
MATCH(a:A),(b:B) WHERE ...use
MATCH(a:A) MATCH(b:B) WHERE...
MATCH(a)-[r]-> ...is more expensive than
length(a.property)=0is more expensive than
a.property IS NULLbut I don't know how much more expensive it is.
I hope this helps. If you could show us your query (along with some statistics:
call apoc.meta.stats and
:schema) perhaps we could better be able to help you.
Thank you both for valuable tips. Now lets say, hypothetically, that the query is as optimized as it can be and I am perfectly ok with it running for 2 hours before I get the result because it is some complex computation or extraction of a large subgraph that simply takes a lot of time.
Now is there some obstacle, like some default dbms.transaction.timeout that will stop me from getting the result from Neo4j?
I found another interesting video. What is very interesting is at minute 15:30.
If your query has TWO things that you are filtering and that has an index, then it's better to force Neo4J to use both indexes.
I'm surprised by this!. (I tried this with version 4.2)
Their example using the Movie DB. Instead of this:
PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person) WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon' RETURN [n in nodes(p) | coalesce(n.title, n.name)]
PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person) USING INDEX p1:Person(name) USING INDEX p5:Person(name) WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon' RETURN [n in nodes(p) | coalesce(n.title, n.name)]
If you don't need some fields in return don't return them it might greatly increase the time needed to return the values. Instead filter properties like this:
WITH keys(e) AS k1, e UNWIND k1 AS k2 WITH e, k2 where k2 <> "relationship_string" and k2 <> "relationshipString" and RETURN id(e) AS entityId, k2 AS Prop,e[k2] AS Value