I'm trying out some Cypher queries on Neo4j Browser (Docker image of the latest community version), and I noticed some behaviour for one query that I can't understand and couldn't find any helpful resources online. The query (code below) tries to return some properties (id and title) of a certain node m (label: Node) that has a relationship with another node n (also label Node):
PROFILE
MATCH (n:Node)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title
When I run the query above, it does not result in a Cartesian product. However, when I specify the label of node m (i.e., m: Node), the query executor performs a Cartesian product, which slows down the query. Could anyone tell me why does this happen? What does adding a node label change to make the query executor perform a Cartesian product?
Also another observation: When I remove the label from node n, the query executor does a full node scan (i.e., it does not use the node index to get it), could anyone explain why does it behave this way?
I am not seeing the same with 5.12.0. Can you provide the output when running cypher and prefaced with PROFILE.
Also
Also another observation: When I remove the label from node n,
the query executor does a full node scan (i.e., it does not use the node
index to get it), could anyone explain why does it behave this way?
so the cypher is thus
MATCH (n)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title
if thats the case I would expect a all node scan for the query above effectively says... find me a node that has a :PARENT_OF relationship to some other node where .... .....
and here we are not qualifying that the node must have label X etc. Your original query should start with a NodeByLabelScan or if you have an index on :Node(id) then it would start with a NodeIndexSeek
I don't see a Cartesian product in the query plan. There is not need for on with this query. You would need multiple MATCH statements written in a specify way to cause a Cartesian product. Can you provide your query plan if it shows a Cartesian product?
The index goes away because the index is defined on the label Node and property, so when you remove the label from the pattern it will not leverage that specific index.
I couldn't attach the screenshot of the query that doesn't have node r labeled because new users are only limited to one embedded media. Here is a description of it:
Query:
PROFILE
MATCH (n:Concept)-[:DESCENDANT_OF*]->(r)
WHERE n.ocid = "261900059304"
RETURN r.ocid, r.label;
Query plan:
No "NodeByLabelScan" step for node r.
No cache properties for node r.
No Cartesian product step.
The lower part of the execution plan (starting from step VarLengthExpand(All)@ neo4j) is exactly the same as the other query.
Though it is a cartesian of 1 x N where N = 29,166,152 which is this 29,166,152 which though a cartesian isnt 'bad'. Now had it been 50 x N where N=29, 166, 152 that would be of concern
It does take much more time than the case where the label is omitted from node r though.
Labeling node r results in a “NodeByLabelScan” step that returns ~29M node, afterwards a Cartesian product step is triggered. So it seems that the real bottleneck here is the “NodeByLabelScan” step that is triggered because of adding the label.
It is somehow strange because I suppose that the query planner should have evaluated different execution plans and choose the most optimal. Any insight on this?
Edit: corrected “AllNodeLabelScan” to “NodeByLabelScan”