Join the community at Nodes 2022, our free virtual event on November 16 - 17.
β10-11-2021 01:25 PM
Neo4j desktop version 1.4.7
Using the Movie data set, why does
MATCH (k:Person)-[:ACTED_IN]-(m:Movie)
WHERE k.name = 'Keanu Reeves'
RETURN k, m
return the 7 movies that Keanu Reeves acted in plus the Keanu Reeves :Person node, but
MATCH (k:Person)-[:ACTED_IN]-(m:Movie)
, (k:Person)-[:ACTED_IN]-(m:Movie)
WHERE k.name = 'Keanu Reeves'
RETURN k, m
returns message (no changes, no records)?
Is it the fact that I'm trying to re-traverse the exact same paths in the MATCH using the same labels? Meaning, since the algorithm only allows one traversal per relationship per MATCH (per label?), then if a second identical traversal is specified the return results are actually nulled out? I'm confused?
I see that the following modification does return Keanu's 7 movies plus the person node as in the first incarnation above, so clearly my confusion has something to do with label use/reuse.
MATCH (k:Person)-[:ACTED_IN]-(m:Movie)
, (k1:Person)-[:ACTED_IN]-(m1:Movie)
where k.name = 'Keanu Reeves'
RETURN k, m
β10-11-2021 02:10 PM
Yes, your first guess here is right!
Is it the fact that I'm trying to re-traverse the exact same paths in the MATCH using the same labels? Meaning, since the algorithm only allows one traversal per relationship per MATCH (per label?), then if a second identical traversal is specified the return results are actually nulled out? I'm confused?
This has to do with the way Cypher performs matching, in that it uses relationship isomorphism, so within a single MATCH, a relationship may only be traversed once:
As for the differences between the first and second query behavior, this has to do with the variables being used.
For the first query:
MATCH (k:Person)-[:ACTED_IN]-(m:Movie)
, (k:Person)-[:ACTED_IN]-(m:Movie)
WHERE k.name = 'Keanu Reeves'
RETURN k, m
Your MATCH restriction is such that the same k
node (that for Keanu Reeves) is used in both parts of the pattern, as well as the same m
node, so you are looking for a MATCH between Keanu Reeves and a movie, and for that same pairing, attempting to match the same pattern, which fails because the same relationship can't be traversed more than once per MATCH. If your graph was slightly different such that there were two :ACTED_IN relationships between Keanu Reeves and one movie, then you would get two paths matched from this (one path where rel 1 was traversed in the first sub-pattern and rel 2 traversed in the second sub-pattern, and another path where the order of traversal was swapped, traversing rel 2 in the first pattern then rel 1 in the second. Remember that paths matched include the ordering of elements in the path, so a different ordering of traversal using the same path elements results in multiple distinct paths).
Your second query is very different:
MATCH (k:Person)-[:ACTED_IN]-(m:Movie)
, (k1:Person)-[:ACTED_IN]-(m1:Movie)
where k.name = 'Keanu Reeves'
RETURN k, m
In particular, the variables used in the second part of the MATCH (k1 and m1) are completely new, and because of that, they have absolutely nothing to do with k
(the node for Keanu Reeves) and m
(the movies Keanu acted in) from your prior MATCH. So k1 and m1 will bind to any pattern such that a person acted in a movie, so there will be a LOT of matched paths. Because of that, it creates a cross product, the results of the first MATCH x the results of the second MATCH, so even if the return looks simple in the graph mode (because there is only one k node for Keanu, and only a small number of movies Keanu has acted in), you will have a ton of rows that fed into that result, with the same k and m nodes repeating over and over and over per paths matched from the second part. If you want to see that complexity, do a RETURN k, m, k1, m1
instead, it will show most of your movies graph, and if you view the tabular results view, you should see several hundred rows of results. Even if you leave your RETURN as-is, if you check the table results view you'll see those same nodes repeating a couple hundred times. The takeaway here: careful with row cardinality, understand that Cypher is concerned with finding all possible path matches in the graph, and make sure you're aware of what nodes newly introduced variables will be matching upon.
A bit more about that multiplicative cardinality issue, in case there's any confusion:
In the default movies graph, there are 7 paths where Keanu Reeves acted in a movie.
If we're just looking for any actor acting in any movie, the MATCH would find 172 paths.
If we ran your second query, which performs the cross product of paths where Keanu Reeves acted in a movie x the paths of every person who acted in a movie, we would get 1197 distinct paths/rows.
1197 / 7 = 171
. That is the total paths divided by the number of paths of Keanu Reeves acting in a movie. Why 171, when there are 172 paths in the graph? Because of that relationship isomorphism, for each of the movies Keanu has acted in (results in the first part of the MATCH), that path on that row will not be matched in the second part of the pattern, because that relationship for that path on that row can only be traversed once.
β10-11-2021 05:25 PM
Thanks Andrew. This really helps and gives me further documentation to refer to. Another example of the helpful Neo4j community and organization!