Cypher UNWIND keeping track of nodes

While going through the beginner data modelling course I came across an example for UNWIND.
This uses the typical movie database example that comes shipped with Neo4J desktop.
My question is....
In the below script how is Neo4J cypher able to keep track of which language was originally for which movie after the first WITH which is collecting all movies and supplying it to next stage?
And then 2nd UNWIND creates a new list of movies. I am not able to understand how the new relationship is able to maintain any integrity with the original data.
By "integrity" here I mean that the IN_LANGUAGE relationship is actually based on the Language array property of every Movie node.

MATCH (m:Movie)
UNWIND m.languages AS language
WITH  language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l,m
MERGE (m)-[:IN_LANGUAGE]->(l);
MATCH (m:Movie)
SET m.languages = null

I am using Neo4J Desktop Version 1.5.8 (1.5.8.105).
I have been struggling to understand this for very long now and any help would be greatly appreciated.
Thanks

m.language is a list of languages for movie 'm'. The first 'unwind' converts the list of languages for a specific movie node 'm' into rows. There will be one row per language, and the row will contain the language and the movie node. As an example, if a specific movie 'm' had two languages (English, Spanish), then the rows would be the following for the one movie node:

m, English
m, Spanish

The above would repeat for each movie returned in the first match.

The first 'with' groups all rows with the same language and collects those movie nodes in the 'movies' list. Following this 'with', we have a line for each language and each line has a list of the movies that had the language for this row.

In summary, the query converted the rows of movie nodes resulting from the first match, to rows of the distinct languages and their corresponding list of movies that had its corresponding movies.

The second 'with' passes the row, where each row has a distinct language and its corresponding list of movies. The second 'unwind' will then convert the list of movies for each language to rows, where each row also has the language repeated for each movie in the language's list of moves being unwound.

Following the third 'with', we have rows of the distinct languages, where each language repeats for each movie associated wit the language.

The pairs of language and movie are then used used to create their relationship during the second merge.

The important behavior of the 'unwind' is that all the over variables defined on the row where the list to be unwound is, will be repeated for each line of the unwound data. This can be seen with the following query:

1 Like

Thanks Gary. Appreciate it.
However it makes me wonder why such a long sequence of instructions when, like you say in your first paragraph, the first UNWIND gives us "one row per language, and the row will contain the language and the movie node" which is what we end up following the third 'with'.

Besides, what exactly is a 'row' in Neo4J?
Sorry, I am quiet new to Neo4J and graph DBs. But I thought the storage model of Neo4J was a Node->relationship->Node in document format. Don't understand how 'rows' come into the picture.
If you can please refer me to some good documentation about this, that too will be helpful.
Thanks in advance.

The query is quite convoluted for what it does. Yes, you could just unwind the languages and get rows of movie and languages, then get the distinct list of pairs, then just create a relationship between each unique pair. I think it was done for some illustrative purpose in the class.

Yes, the data is stored as a collection of nodes and relationships between them. When you search for a pattern in cypher, you will get a row for each path that matched. Take the following graph as an example:

create(c:Company{name:'Apple'})
create(p1:Person{name:'Steve'})
create(p2:Person{name:'Bob'})
Create(p3:Person{name:'Gregg'})
create(c)<-[:WORKS_AT]-(p1)
create(c)<-[:WORKS_AT]-(p2)
create(c)<-[:WORKS_AT]-(p3)

If you execute the following query to get the nodes and their relationship, you see the results are returned as three rows, one for each path between Apple and each Person.

If you want the relationship too, then add it to the return statement. Again, each path is returned as a row.

The data is not returned as a hierarchal graph, but rows of data.

1 Like