I have a question for the Cypher gurus regarding semantics of cypher, and the conceptual model for Cypher. Here are two queries and their results that illustrate the point.
Query 1: return "a" as LetterA
As I would expect results in one row with "a", i.e.:
LetterA
"a"
While Query 2:
match (n)
return "a" as LetterA
Surprisingly results in 3 rows (note there are 3 matches to node n in the database):
LetterA
"a"
"a"
"a"
Why is this ? I would have thought that the resulted projected via RETURN would not be dependent on the results of a seemingly arbitrary MATCH clause that are not even in the RETURN projection.
Super interesting. To answer your question first, yes I would expect the results you said if it was an RDBMS. I'm trying to make sure I understand neo4j so I can use it effectively since in this example with return we are synthesizing data rather than returning what's in the db. So, let me test my understanding. Is it correct that the general statement is that the clause before the RETURN controls the projection of items listed in the RETURN clause? (e.g. in the case of query 2 above the results of MATCH(n) control the projection of "a")
If so, what are the general rules for this? See the queries below for some examples of behavior using UNWIND instead of MATCH and a 2 row returned value (rather than just a 1 row "a").
For example Query 3:
with ["a","b"] as x
unwind x as X //create X with two rows, row1="a" and row2="b"
match (n)
return X
results in this replication (which has a logic to it- in this case 6 n's in the db):
X
"a"
"a"
"a"
"b"
"b"
"b"
Meanwhile Query 4:
with ["a","b"] as x
unwind x as X //create X with two rows, row1="a" and row2="b"
return X
Thanks Dana. Do you have the specific rules for how the clause before the RETURN controls the projection? If so, that it would be great. Also, this seems like it could be quite powerful - much more so than the SELECT in a traditional SQL RDBMS. If there are any examples of queries that you have that use this to do something interesting that could be useful.
If you want to understand Cypher's semantics, I suggest you read this short section of the Neo4j docs: Clause composition - Cypher Manual.
I found it very useful.
Do you understand the semantics of RETURN and know of any documentation for it? From what I can see (examples above) it is not like a typical "return" statement in a programming language, which just returns the RETURN's argument list to the calling program.
The semantics seems to be something like: It's behavior is governed by a side effect of what ever projection was done most recently before the RETURN statement. So, for example, if it is returning a variable with 1 row, it repeats that row to match the number of the rows in the most recent previous projection (which may not be in the RETURN's argument list). If it is returning a variable with 2 rows it uses some sort of expansion rules to replicates the 2 rows to align with a the most recent prior projection.
These are obviously design choices in the semantics of RETURN and not some sort of bug.
If anyone can point to documentation on this it would be useful.
The return statement returns each row of data with the values listed. It does not duplicate anything. What your are seeing that is confusing you is the unwind behavior. The unwind will unwind the list into multiple rows and duplicate the other values in the current row as the list you are unwinding. See the example below:
I think I might be on the verge of an Aha! moment.
So, are you saying that (using the example above) the model in Cypher is that when the UNWIND happens all the previously declared variables in the query (e.g. name in the WITH clause earlier) are replicated to align to the dimensions of the UNWIND's result?
So, the following is not how cypher works: 1/ the UNWIND only operates on "list" and the other variables (e.g. name) are left alone, then 2/the replication of name to align to dimensions of list only happens if they appear together in the RETURN clause.
Yes. Unwind expand the list one each row into rows and the other values defined on that row are duplicated. A subsequent return or with statement can be used to pass all or some of the values on.
That makes sense. So if I understand correctly, in the case of this query (where there are say 6 rows coming from match(n):
with ["a","b"] as x
unwind x as X //create X with two rows, row1="a" and row2="b"
match (n)
return X
the result is:
X
"a"
"a"
"a"
"b"
"b"
"b"
Is the correct explanation for this that results (1 column, 2 rows) of the UNWIND are combined/aligned with the results of the match (1 column, 6 rows) to create a table with 2 columns (n and X) and 6 rows Then the RETURN just returns the 6 rows in the X column ?
So close....there will actually be six rows of "a" and six rows of "b". The unwind produces two rows with x="a" and x="b". The match then executes for each row, generating six rows for x="a" and the same six rows for x="b". Each match row has its corresponding value of x appended to its results.
Test data:
unwind [1,2,3,4,5,6] as id
create (x:Test{id: id})
Wow -thank you! The fact that the match executes for each row from the previous clause (e..g. unwind) was lost on me. I think this demystifies a lot of behavior. I hope this thread is useful for others in the future.
thanks again!
saman
Yes, the match executes for each row of data. Typically the match will use results from the row data, I.e., like a correlated query. In this case that doesn’t exists, so maybe that adds to the confusion. I am glad you have been enlightened.
Thank you guys. i read that this is the effect of cartesian products. the behavior of nested unwind, match, ..etc is like nested loops of all nodes or elements.
thanks again