Why does specifying the label of a node result in a Cartesian product?

Hatem · October 18, 2023, 5:19pm

Hello Neo4j Community,

I'm trying out some Cypher queries on Neo4j Browser (Docker image of the latest community version), and I noticed some behaviour for one query that I can't understand and couldn't find any helpful resources online. The query (code below) tries to return some properties (id and title) of a certain node m (label: Node) that has a relationship with another node n (also label Node):

PROFILE
MATCH (n:Node)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title

When I run the query above, it does not result in a Cartesian product. However, when I specify the label of node m (i.e., m: Node), the query executor performs a Cartesian product, which slows down the query. Could anyone tell me why does this happen? What does adding a node label change to make the query executor perform a Cartesian product?

Also another observation: When I remove the label from node n, the query executor does a full node scan (i.e., it does not use the node index to get it), could anyone explain why does it behave this way?

Thanks in advance! :))

dana_canzano · October 18, 2023, 7:15pm

@Hatem

I am not seeing the same with 5.12.0. Can you provide the output when running cypher and prefaced with PROFILE.

Also

Also another observation: When I remove the label from node n, 
the query executor does a full node scan (i.e., it does not use the node 
index to get it), could anyone explain why does it behave this way?

so the cypher is thus

MATCH (n)-[:PARENT_OF*]->(m)
WHERE n.id = "200"
RETURN m.id, m.title

if thats the case I would expect a all node scan for the query above effectively says... find me a node that has a :PARENT_OF relationship to some other node where .... .....
and here we are not qualifying that the node must have label X etc. Your original query should start with a NodeByLabelScan or if you have an index on :Node(id) then it would start with a NodeIndexSeek

glilienfield · October 19, 2023, 12:46am

I don't see a Cartesian product in the query plan. There is not need for on with this query. You would need multiple MATCH statements written in a specify way to cause a Cartesian product. Can you provide your query plan if it shows a Cartesian product?

The index goes away because the index is defined on the label Node and property, so when you remove the label from the pattern it will not leverage that specific index.

Hatem · October 19, 2023, 11:08am

Thank you @dana_canzano and @glilienfield for the answer.

Here is the execution plan of the following query:
1- Labeling node r results in a Cartesian product in the execution plan.

PROFILE
MATCH (n:Concept)-[:DESCENDANT_OF*]->(r:Concept)
WHERE n.ocid = "261900059304"
RETURN r.ocid, r.label;

I couldn't attach the screenshot of the query that doesn't have node r labeled because new users are only limited to one embedded media. Here is a description of it:
Query:

PROFILE
MATCH (n:Concept)-[:DESCENDANT_OF*]->(r)
WHERE n.ocid = "261900059304"
RETURN r.ocid, r.label;

Query plan:

No "NodeByLabelScan" step for node r.
No cache properties for node r.
No Cartesian product step.
The lower part of the execution plan (starting from step VarLengthExpand(All)@ neo4j) is exactly the same as the other query.

dana_canzano · October 19, 2023, 5:57pm

@Hatem

Though it is a cartesian of 1 x N where N = 29,166,152 which is this 29,166,152 which though a cartesian isnt 'bad'. Now had it been 50 x N where N=29, 166, 152 that would be of concern

Hatem · October 19, 2023, 7:54pm

@dana_canzano

It does take much more time than the case where the label is omitted from node r though.

Labeling node r results in a “NodeByLabelScan” step that returns ~29M node, afterwards a Cartesian product step is triggered. So it seems that the real bottleneck here is the “NodeByLabelScan” step that is triggered because of adding the label.

It is somehow strange because I suppose that the query planner should have evaluated different execution plans and choose the most optimal. Any insight on this?

Edit: corrected “AllNodeLabelScan” to “NodeByLabelScan”

dana_canzano · October 19, 2023, 8:20pm

@Hatem
im confused as your previously provided image/plan includes a NodeByLabelScan r:Concept
but yet you state

Labeling node r results in a “AllNodeLabelScan” step that returns

Hatem · October 19, 2023, 8:41pm

@dana_canzano

Sorry, you’re right, it is a “NodeLabelScan” as it appears in the screenshot/plan.. I’ll edit the previous comment..

Topic		Replies	Views
Yet another Cartesian Product problem Cypher	5	583	May 18, 2020
Why does a single label massively alter my query plan? Cypher	15	905	March 2, 2020
Query Performance for Label Matching Cypher	3	281	November 25, 2021
Cross Product Cypher queries will not perform well Cypher performance , cypher , knowledge-base	0	765	August 23, 2018
Cartesian product issue Desktop cypher	3	480	May 2, 2023

Why does specifying the label of a node result in a Cartesian product?

Related topics