Slow cypher

hahmed24 · April 20, 2021, 6:31am

Hi there!
i am new to neo4j and having hard time counting the path between two nodes.
i want to count the path between A and Z where the frequency of each relationship is greater than two.
i tried this code but it is not returning anything; it keeps running and not yield and result, it runs indefinitely.

match (n:Concept),(m:Concept)
where toUpper (n.name) contains "TUBERCULOSIS"
AND toUpper (m.name) contains "CHEST PAIN"
with n,m
match path=(n)-[:CAUSES*]->(m)
where all (a in relationships(path) where a.freq>2)
RETURN count(distinct (path)) as distinct_paths

I will be grateful for your help.
side note: the graph is large with 350 million nodes and 450 million relationships.
Also, anything higher than [:CAUSES*1..2] runs indefinitely and no result is shown.

marius.conjeaud · April 20, 2021, 9:42am

A few questions for you to better analyze your issue :

Do you have indexes on Concept.name ? Important : using a toUpper() will negate the use of an index, as indexing is case sensitive. Consider using a full-text index, or storing your string in the format that you'll want to query it.
Do you have the possibility to use STARTS WITH instead of CONTAINS for your string filter ?
Can you execute the same query with a "PROFILE" added before the query (or EXPLAIN if your query is really too long or never returns) and provide a screenshot of the returned execution graph ?

Other topic : your query builds a cartesian product on the first row, meaning that if you get n Tuberculosis nodes, and m ChestPain nodes, you are passing along n*m nodes to the rest of the execution.
Consider writing it that way :

MATCH path=(n:Concept)-[:CAUSES*]->(m:Concept)
WHERE n.name CONTAINS "xxx" AND m.name CONTAINS "yyy" AND all(a in relationships(path) where a.freq>2)

hahmed24 · April 20, 2021, 4:33pm

thank you for you response.
no there are no indexes,
if i use
(n)-[:causes]->(n1)-[:causes]->(n3)
i get a return count,
but if i do
**(n)-[:causes]->(n1)-[:causes]->(n3)-[:causes]->(n4), it runs indefinitely
as far as starts with, it still has the same issue.
writing it the way you suggested return value of zero even for the positive examples that i know for sure are not zero. Ithink that the names are a mix of upper and lower cases and for that reason i used toUpper, to eliminate that issue.
keep in mind that the node label is Concept and property key is name.

this is how it runs forever.
Screenshot from 2021-04-20 11-32-16

marius.conjeaud · April 21, 2021, 8:35am

You have several potential issues here.

The first one was, filtering on a non-indexed property (name for Concept label). I understand that this property is not normalized (it could be Tuberculosis, as well as TUBERCULOSIS depending on nodes). For this, I would suggest storing it in a normalized way, for instance toLower. This way, you can index it and then query it with the same norm (all lowercase).
The cartesian product, which is fixed by moving the filtering on n & m inside the path declaration instead of doing a MATCH separated by a comma

These two issues do not relate to the issue you have with your 2+-length paths, but you should still consider fixing them as it will become an issue sooner or later.

Now, for your issue with paths. Could you execute the following query and provide back the execution plan (see example picture below the query) ? This will help understand what is happening.
Please note that I left the toUpper(m.name) even though you should change that to benefit from indexing. I'll just leave that for now so you don't have to make changes to your property and indexes right now.

EXPLAIN
MATCH path=(n:Concept)-[:CAUSES*]->(m:Concept)
WHERE toUpper(n.name) CONTAINS "TUBERCULOSIS" AND toUpper(m.name) CONTAINS "CHEST PAIN" AND all(a in relationships(path) where a.freq>2)
RETURN count(distinct (path)) as distinct_paths

hahmed24 · April 21, 2021, 12:05pm

This is the plan

michael.hunger · April 22, 2021, 9:44am

You should enforce the indexes, i.e. store your indexed names as uppercase so you don't need to do the function and it uses the index, and can start from both sides for the variable path length.
it's also better to use concrete matching entries which is faster than contains (or use medical identifiers instead of names)

Adding a max limit to the path will also help

Unfortunately accessing and checking rel-properties on a lot of relationships takes time.
One thing you can do in your model is to create those relationships then your filter will be faster:

MATCH (a:Concept)-[:CAUSES]->(b:Concept) WHERE r.freq > 2
CREATE (a)-[:FREQUENT]->(b)

and then use that FREQUENT relationship in your path query.

match path=(n:Concept)-[:FREQUENT*]->(m:Concept)
where n.name contains "TUBERCULOSIS" AND m.name contains "CHEST PAIN"
RETURN count(*) as distinct_paths

hahmed24 · May 20, 2021, 11:05pm

Thank you very much for your help, I am still having difficulties with it.
it is still not enough to traverse the paths
hardware wise, I am using 64GB RAM and ryzon 16-core CPU, I though it would help to move away from intel but I am still having trouble with it.

michael.hunger · June 23, 2021, 11:19pm

With neo4j 4.3 you can create a relationship property index which should speed up your query quite a bit.

https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/#administration-indexes-create-a-composite-index-for-relationships

Topic		Replies	Views
Why is this query too slow? Cypher	5	560	October 2, 2023
Neo4j crashing while running cypher path query Cypher paths	4	1186	April 7, 2019
Count two relationships in one MATCH separately Neo4j Graph Platform counts	4	1335	December 17, 2019
Slow Cypher Query Help Needed! Cypher performance	6	942	January 16, 2020
How to efficiently count all paths between two nodes in Cypher? Cypher apoc	4	4103	September 23, 2019

Get Certified in June!

Slow cypher

match (n:Concept),(m:Concept) where toUpper (n.name) contains "TUBERCULOSIS" AND toUpper (m.name) contains "CHEST PAIN" with n,m match path=(n)-[:CAUSES*]->(m) where all (a in relationships(path) where a.freq>2) RETURN count(distinct (path)) as distinct_paths

Thank you very much for your help, I am still having difficulties with it. it is still not enough to traverse the paths hardware wise, I am using 64GB RAM and ryzon 16-core CPU, I though it would help to move away from intel but I am still having trouble with it.

Related topics

match (n:Concept),(m:Concept)
where toUpper (n.name) contains "TUBERCULOSIS"
AND toUpper (m.name) contains "CHEST PAIN"
with n,m
match path=(n)-[:CAUSES*]->(m)
where all (a in relationships(path) where a.freq>2)
RETURN count(distinct (path)) as distinct_paths

Thank you very much for your help, I am still having difficulties with it.
it is still not enough to traverse the paths
hardware wise, I am using 64GB RAM and ryzon 16-core CPU, I though it would help to move away from intel but I am still having trouble with it.