I'm currently doing the course "Graph Data Modeling Fundamentals"(https://graphacademy.neo4j.com/courses/modeling-fundamentals/) and I'm very confused about what exactly happens to the data after a WITH is used. I've read the documentation from the Neo4j Cypher Manual, but it isn't really clicking for me. I've been working in Graph Academy with the "Movie" Database and this cypher code is given in the course to create language nodes from what was previously the .languages property.
MATCH (m:Movie)
UNWIND m.languages AS language
WITH language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l,m
MERGE (m)-[:IN_LANGUAGE]->(l);
MATCH (m:Movie)
SET m.languages = null
As you can see, WITH is used multiple times in the above query, but what does it really do? What is the significance of the first variable (before the comma) and the second one (after the comma) and how do they become related after the clause is executed?
Your pseudo code gets the point, but there is more gong one with an ‘unwind’ clause. When you unwind a collection, you are creating a result row per element in the collection. Along with the individual elements, one per line, each line will include the other variables in scope being repeated for each line resulting from an unwind.
for example, assume there are two movie modes resulting from the match, then the output of the match will be the following:
movie1
movie2
Each movie node contains a property called languages, which is a collection. The result from the unwind of m.language per movies mode is now more than the two movie modes. The result of the ‘match’ followed by the ‘unwind’ will now be the following, assuming movie one has two languages and movie two has three:
movie1, language1
movie1, language2
movie2, language2
movie2, language3
movie2, language4
One of the purposes of ‘with’ is to group rows to apply aggregate functions over the grouped rows. The grouping is determined by the ‘with’ parameters not included in the aggregate functions. In the above example, the grouping is done by the language and they are collection the movies nodes; there, all the rows with the same language will be grouped and the collection will be done over those rows. The same if the aggregate was sum, min, max, count, etc, those calculations with be done group by group. The result from our example will be.