Neo4j browser exercises suggests queries that leads to "cartesian product" warning

Hello! Thanks for a fantastic product.

I've been seeing a lot of suggested queries in the learning material and the in-browser exercises that leads to the warning "This query builds a cartesian product between disconnected patterns". This causes some confusion IMO and should be fixed to promote best practices.

For example in Exercise 11.13:
Use MERGE to create the DIRECTED relationship between Robert Zemeckis and the Movie , Forrest Gump .

MATCH (p:Person), (m:Movie)
WHERE p.name = 'Robert Zemeckis' AND m.title = 'Forrest Gump'
MERGE (p)-[:DIRECTED]->(m)

I'm new to neo and cypher so I'm not certain if the warning is a false-positive in this case or it's actually a problem. But either the warning should be fixed or the query should be changed so it's not confusing to beginners.

yes for the most part a false positive.

MATCH (p:Person), (m:Movie)

is asking to find all :Person nodes matched to all :Movies nodes which is a cartesian product. However the WHERE clause of

WHERE p.name = 'Robert Zemeckis' AND m.title = 'Forrest Gump'

which returns 1 Person and 1 Movie and thus a 1x1 cartesian which is simply 1 result should not present an issue.

Now had your dataset been such that you had 10 :Person nodes with a name of Robert Zemeckis and 4 :Movie nodes with a title of Forrest Gump then this would be a 10x4 cartesian and thus 40 results

Hey Dana. Thank you for that explanation.

So is there any downside (except perhaps verboseness) to always doing this to avoid the latter from happening?

MATCH (p:Person)
WHERE p.name = 'Robert Zemeckis'
MATCH (m:Movie)
WHERE m.title = 'Forrest Gump'
RETURN p, m

I'm assuming this would only return 14 results?

they are functionally equivalent and if you preface the statement with PROFILE you will see they generate the same plan.
though I'm confused by I'm assuming this would only return 14 results? Shouldnt this return a 1x1 result and thus 1 result?

Are you using the default :play movies database?

Sorry. Ignore what I said about return. My point was given the scenario you described:

Now had your dataset been such that you had 10 :Person nodes with a name of Robert Zemeckis and 4 :Movie nodes with a title of Forrest Gump then this would be a 10x4 cartesian and thus 40 results

Is it true that this:

MATCH (p:Person)
WHERE p.name = 'Robert Zemeckis'
MATCH (m:Movie)
WHERE m.title = 'Forrest Gump'

Would be more efficient than this:

MATCH (p:Person), (m:Movie)
WHERE p.name = 'Robert Zemeckis' AND m.title = 'Forrest Gump'

Or there's no difference?

no difference. if you take either Cypher statement and preface it by the word PROFILE you will see no difference

1 Like