Why does specifying the label of a node result in a Cartesian product?

Hello Neo4j Community,

I'm trying out some Cypher queries on Neo4j Browser (Docker image of the latest community version), and I noticed some behaviour for one query that I can't understand and couldn't find any helpful resources online. The query (code below) tries to return some properties (id and title) of a certain node m (label: Node) that has a relationship with another node n (also label Node):

PROFILE
MATCH (n:Node)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title

When I run the query above, it does not result in a Cartesian product. However, when I specify the label of node m (i.e., m: Node), the query executor performs a Cartesian product, which slows down the query. Could anyone tell me why does this happen? What does adding a node label change to make the query executor perform a Cartesian product?

Also another observation: When I remove the label from node n, the query executor does a full node scan (i.e., it does not use the node index to get it), could anyone explain why does it behave this way?

Thanks in advance! :))

@Hatem

I am not seeing the same with 5.12.0. Can you provide the output when running cypher and prefaced with PROFILE.

Also

Also another observation: When I remove the label from node n, 
the query executor does a full node scan (i.e., it does not use the node 
index to get it), could anyone explain why does it behave this way?

so the cypher is thus

MATCH (n)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title

if thats the case I would expect a all node scan for the query above effectively says... find me a node that has a :PARENT_OF relationship to some other node where .... .....
and here we are not qualifying that the node must have label X etc. Your original query should start with a NodeByLabelScan or if you have an index on :Node(id) then it would start with a NodeIndexSeek

1 Like

I don't see a Cartesian product in the query plan. There is not need for on with this query. You would need multiple MATCH statements written in a specify way to cause a Cartesian product. Can you provide your query plan if it shows a Cartesian product?

The index goes away because the index is defined on the label Node and property, so when you remove the label from the pattern it will not leverage that specific index.

1 Like

Thank you @dana_canzano and @glilienfield for the answer.

Here is the execution plan of the following query:
1- Labeling node r results in a Cartesian product in the execution plan.

PROFILE
MATCH (n:Concept)-[:DESCENDANT_OF*]->(r:Concept)
WHERE n.ocid = "261900059304"
RETURN r.ocid, r.label;

I couldn't attach the screenshot of the query that doesn't have node r labeled because new users are only limited to one embedded media. Here is a description of it:
Query:

PROFILE
MATCH (n:Concept)-[:DESCENDANT_OF*]->(r)
WHERE n.ocid = "261900059304"
RETURN r.ocid, r.label;

Query plan:

  • No "NodeByLabelScan" step for node r.
  • No cache properties for node r.
  • No Cartesian product step.
    The lower part of the execution plan (starting from step VarLengthExpand(All)@ neo4j) is exactly the same as the other query.

@Hatem

Though it is a cartesian of 1 x N where N = 29,166,152 which is this 29,166,152 which though a cartesian isnt 'bad'. Now had it been 50 x N where N=29, 166, 152 that would be of concern

@dana_canzano

It does take much more time than the case where the label is omitted from node r though.

Labeling node r results in a “NodeByLabelScan” step that returns ~29M node, afterwards a Cartesian product step is triggered. So it seems that the real bottleneck here is the “NodeByLabelScan” step that is triggered because of adding the label.

It is somehow strange because I suppose that the query planner should have evaluated different execution plans and choose the most optimal. Any insight on this?

Edit: corrected “AllNodeLabelScan” to “NodeByLabelScan”

@Hatem
im confused as your previously provided image/plan includes a NodeByLabelScan r:Concept
but yet you state

Labeling node r results in a “AllNodeLabelScan” step that returns

@dana_canzano

Sorry, you’re right, it is a “NodeLabelScan” as it appears in the screenshot/plan.. I’ll edit the previous comment..