I feel like I'm encountering some unintuitive behavior that I'm trying to explain.
My database has about 50k nodes, and 200k relationships. I have two queries with wildly different performance characteristics.
Query 1:
MATCH p=(x:Element {name: "Target"})<-[:Has|Belongs*]-(y) RETURN y
This computes with 5200 total hits in 65ms.
Query 2:
MATCH p=(x:Element {name: "Target"})<-[:Has|Belongs*]-(y:Node) RETURN y
This computes with 8,813,850 db hits in 32396ms
I would have expected that the second query would have less computation time since the set of source nodes is restricted to a specific label. Am I missing something?
NodeIndexSeek x:Element(name) WHERE name = $
VarLengthExpand (x)<-[anon_0:Has|Belongs*]-(y)
ProduceResults y
Query 2:
NodeIndexSeek x:Element(name) WHERE name = $ && NodeByLabelScan y:Node
CartesianProduct x, y
VarLengthExpand (x)<-[anon_0:Has|Belongs*]-(y)
ProductResults y
Apologies for the notation, the difference being that in Query 2, it runs NodeByLabelScan y:Node at the same time as NodeIndexWeek x:Element(name) then feeds that into CartesianProduct.
Profiling both queries tells me that it's the VarLengthExpand that differs wildly in db hits from about 4.5k to 9 million hits.
Clearly the problem is that the Query planner is using a NodeByLabelScan plus Cartesian Product instead of Expanding on x and filtering on y afterwards. Which version of Neo4J are you using?
Can you try:
MATCH (x:Element {name: "Target"})
WITH x
MATCH p=(x)<-[:Has|Belongs*]-(y:Node)
RETURN y
Bennu
PS: Next time, a screenshot of the planner could be easier for both of us.