cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Maybe I misunderstood OPTIONAL MATCH. If not there should be something wrong

paolodipietro58
Graph Voyager

neo4j community server 4.3.3 on Ubuntu 20.04

Dear all, there is a strange behavior in the OPTIONAL MATCH.
Maybe, I misunderstood, but would be nice if someone could explain me.

I have this simple query:

MATCH (x:X  {uuid: 'ccc9c72a-3a1d-4683-8cf9-c98fba83750c'})
RETURN x

which return exactly one object:

╒══════════════════════════════════════════════════════════╕
│"x"                                                       │
╞══════════════════════════════════════════════════════════╡
│{"uuid":"ccc9c72a-3a1d-4683-8cf9-c98fba83750c"}           │
└──────────────────────────────────────────────────────────┘

Now, I change the query, without modifying the first part, adding an OPTIONAL relationship:

OPTIONAL MATCH (x:X  {uuid: 'ccc9c72a-3a1d-4683-8cf9-c98fba83750c'})-[:REL]->(y:Y)
RETURN x,y

I expected the query to return exactly the same node as before, with all the y related.
If there are instances of Y into the DB, the query succeed.

But, if there is no 'Y' instance into the DB, the query returns nothing as in the following sample:

╒════╤═══════╕
│"x" | "y"   │
╞════╪═══════╡
│null| null  │
└────┴───────┘

I expected something like

╒════════════════════════════════════════════════════════╕
│"x"                                             │"y"    │
╞════════════════════════════════════════════════╪═══════╡
│{"uuid":"ccc9c72a-3a1d-4683-8cf9-c98fba83750c"} │null   │
└────────────────────────────────────────────────┴───────┘

Where is the error?

Thank you

1 ACCEPTED SOLUTION

Not a flaw in Cypher, you might need to adjust a your understanding of what it means to do this query.

You are reading this as:

find me everything you can, and everything else in the pattern is optional

But with that interpretation, you might get VERY different behavior depending on what's in your graph. You've been approaching this assuming that the x node exists, but a connected y node might not. But what if the data in your graph was the other way around?

What if there is no :X node with that specific uuid, but there ARE :Y nodes in your graph, whether or not they're connecting to any other :X node. Would it be correct to return a row for every single :Y node in your graph, and for each row emit a null for x, since each :Y node doesn't have the specified connected x node? That doesn't seem useful, and it would blow up your row cardinality. If the optional pattern here was longer than a simple 2-node pattern, then the result with this interpretation might get even more unpredictable and costly.

OPTIONAL MATCH is about optionally matching to the entire pattern provided (and this includes being able to pass any following WHERE clause!), and if no such path is found in the graph, all newly introduced variables in the OPTIONAL MATCH are bound to null.

If you want to find the x node in any case, and then find any connected y node optionally, then you must break this into multiple OPTIONAL MATCHes:

OPTIONAL MATCH (x:X  {uuid: 'ccc9c72a-3a1d-4683-8cf9-c98fba83750c'})
OPTIONAL MATCH (x)-[:REL]->(y:Y)
RETURN x, y

The pattern in an OPTIONAL MATCH ... WHERE ... needs to be treated atomically, not piecemeal.

View solution in original post

4 REPLIES 4

Hi, @paolodipietro58 !

Your assumption on the behaviour of OPTIONAL MATCH is correct. However, the correct usage of OPTIONAL MATCH is using it in the same query after a simple MATCH as it is optional.

This should work:

MATCH (x:X  {uuid: "ccc9c72a-3a1d-4683-8cf9-c98fba83750c"})
OPTIONAL MATCH (x)-[:REL]->(y:Y)
RETURN x, y

Hi, @alejandropuerto!

Yes, your suggestion surely solve the problem, but I'm not sure if it is semantically correct.
In this case we are doing two matches just to solve what, IMHO, could be considered a flaw in neo4j cypher.

For me,

OPTIONAL MATCH (x:X  {uuid: 'ccc9c72a-3a1d-4683-8cf9-c98fba83750c'})-[:REL]->(y:Y)
RETURN x,y

should provide the same result: it should return all the optional nodes and relationships it encounter in its path starting from x:X, to :REL and then to y:Y: the first non existend element int this path should return null.

But this is just my point of view. Would like to hear from neo4j people ....

Anyway, thank you for the solution.

Not a flaw in Cypher, you might need to adjust a your understanding of what it means to do this query.

You are reading this as:

find me everything you can, and everything else in the pattern is optional

But with that interpretation, you might get VERY different behavior depending on what's in your graph. You've been approaching this assuming that the x node exists, but a connected y node might not. But what if the data in your graph was the other way around?

What if there is no :X node with that specific uuid, but there ARE :Y nodes in your graph, whether or not they're connecting to any other :X node. Would it be correct to return a row for every single :Y node in your graph, and for each row emit a null for x, since each :Y node doesn't have the specified connected x node? That doesn't seem useful, and it would blow up your row cardinality. If the optional pattern here was longer than a simple 2-node pattern, then the result with this interpretation might get even more unpredictable and costly.

OPTIONAL MATCH is about optionally matching to the entire pattern provided (and this includes being able to pass any following WHERE clause!), and if no such path is found in the graph, all newly introduced variables in the OPTIONAL MATCH are bound to null.

If you want to find the x node in any case, and then find any connected y node optionally, then you must break this into multiple OPTIONAL MATCHes:

OPTIONAL MATCH (x:X  {uuid: 'ccc9c72a-3a1d-4683-8cf9-c98fba83750c'})
OPTIONAL MATCH (x)-[:REL]->(y:Y)
RETURN x, y

The pattern in an OPTIONAL MATCH ... WHERE ... needs to be treated atomically, not piecemeal.

tard_gabriel
Ninja
Ninja

Both MATCH and OPTIONAL MATCH are trying to match the whole pattern in the graph.

Putting a MATCH before an OPTIONAL MATCH ensure that the x, in this case, will be printed as a row. And your OPTIONAL MATCHES will be based on these(s) row. (s) if they were more than one x.

If you only use an OPTIONAL MATCH, knowing that the whole pattern as to match to print any row, your optional match might return noting at all even if the x part of it exists.

So why MATCH and OPTIONAL MATCH always match the whole pattern?
Because, and it's a guess from a guy who read the query tuning behind Cypher, the engine as to know where to start.

It's like if you were saying to a detective that every clue can be important or not, an how they are connected doesn't matter, so nothing will be resolve.