I have a Cypher query:
MATCH (x:lblA:lblB:lblC:lblD)-[:relA]->(:lblE)
WITH x as x
MATCH (x)-[:relB]->(y:lblY),
(y)-[:relC]->(z:lblZ)
RETURN count(DISTINCT z)
Where labelA represents approximately 250k nodes, and labels B, C and D are essentially subsets of those 250k.
The above query returns a count of 33, however, it takes about 20 minutes to run.
If I remove any one of labels lblA
, lblB
, lblC
or lblD
, the query returns in 200ms.
According to a profile of the queries, the difference between having 3 labels and 4 labels on x
is that with 3 labels, Neo seems to apply those first and then do joins to y
and z
but with 4 labels, Neo seems to Cartesian join all my data, then apply the labels, resulting in about 2 billion DB hits.
Any ideas why Neo is doing this? Is there a way to stop it doing the Cartesian join if I have 4 labels on x
?
Thanks,
Stephen