Get nodes connected to two different sub-graphs, efficiency question

The idea is to check if a node is related to one of the many nodes under two different hierarchies, so imagine you are interested in finding diseases that ails both cats and dogs, but not other mammals like primates or rodents. It's a silly example I know :slight_smile:

I have a query that is conceptually something like this:

MATCH p1=(x: Animal)-[:SUB_CLASS_OF*0..3]->(dog: Animal {species: "Canis familiaris"})
WITH nodes(p1) as sub_dog
MATCH p2=(y: Animal)-[:SUB_CLASS_OF*0..3]->(cat: Animal {species: "Felis catus"})
WITH sub_dog, nodes(p2) as sub_cat
MATCH (a1:Animal)-[r1]-(d:Disease)-[r2]-(a2:Animal)
WHERE a1 IN sub_cat AND a2 IN sub_dog
RETURN a1,r1,d,r2,a2

The query above takes forever, since there are many different species and many diseases, and MANY different ways (i.e. rel types) diseases might be associated with animals. (also animals have many different types of relationships with other entities in this large graph which about 3M nodes and 50M edges in total).

I am primarily interacting with the graph via Neo4J browser. On Safari, running this query shuts down, and promptly reloads, the page. So it never runs to completion... On Chrome it runs over 10-15 mins with the computer running close to full capacity on the cpu.

I take that as a clear sign that the query isn't very well written. Any suggestions on how to tackle this?


MATCH (:Dog)<-[:SUB_CLASS_OF*0..3]-(dog)-[r1]-(d:Disease)-[r2]-(cat:Animal)-[:SUB_CLASS_OF*0..3]->(:Cat)
RETURN dog, r1, d, r2, cat

If you make r1 and r2 something like [:HAS_DISEASE {transmission}]-> that could make it more efficient.

Hope that helps. Full disclosure, I didn't test or profile this, just off the cuff while taking a break.