I'm having some trouble understanding the WITH
clause, when used with aggregation functions like count()
.
I tried using this query from the Cypher ref card as an example -
MATCH (user)-[:FRIEND]-(friend)
WHERE user.name = $name
WITH user, count(friend) AS friends
WHERE friends > 10
RETURN user
What is the purpose of count(friend)
in this query? Is it just counting all the end nodes which are connected to the given user with the FRIEND
relationship? If so, why is the query returning the given user node? Because we already know who the starting user is.
I tried loading the "movies" graph in Neo4j, to run similar queries. Then, I modified the above query for the movies graph, like so -
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, count(m) as movies
WHERE movies > 3
RETURN h, movies
This is what the query returns -
╒════════════════════════════════╤════════╕
│"h" │"movies"│
╞════════════════════════════════╪════════╡
│{"name":"Tom Hanks","born":1956}│12 │
└────────────────────────────────┴────────┘
In theory, I understand what this query is doing - return a count of all movies where Tom Hanks "acted in", if they are greater than 3. But in terms of the query and the aggregation function, what is count(m)
doing? Is it a count of all movie nodes where Tom Hanks is the start node with "ACTED_IN" relationship connecting the two nodes?
If so, why do the following queries not return anything?
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, r, count(m) as movies
WHERE movies > 3
RETURN h,r, movies
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h,r, count(m) as movies
WHERE movies > 3
RETURN movies
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, m,count(m) as movies
WHERE movies > 3
RETURN h, m
MATCH (h:Person)-[r:ACTED_IN]->(m:Movie)
WITH h, r, m,count(m) as movies
WHERE movies > 6
RETURN h, m
MATCH p=(h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH p, count(r) as movies
WHERE movies > 3
RETURN p
Queries 1 through 4 are just different variations of the main query, where I just add different variables to the WITH clause, so they 're carried over to the next stage of the query. In Query 5, I try to count the relationships, which should ideally behave the same way as counting the destination (movie) node, but doesn't work.
I also tried fiddling with the FOLLOWS
relationship in the movie graph, with this query -
MATCH p=()-[r:FOLLOWS]->(b:Person)
WITH count(b) AS popular
WHERE popular > 1
RETURN popular;
This query returns 3, because there are three FOLLOWS relationships in the graph -
══════════════════════════════════════════════════════════╕
│"p" │
╞══════════════════════════════════════════════════════════╡
│[{"name":"Paul Blythe"},{},{"name":"Angela Scope"}] │
├──────────────────────────────────────────────────────────┤
│[{"name":"James Thompson"},{},{"name":"Jessica Thompson"}]│
├──────────────────────────────────────────────────────────┤
│[{"name":"Angela Scope"},{},{"name":"Jessica Thompson"}] │
└──────────────────────────────────────────────────────────┘
But if I add the WITH
clause and count()
to this query, it doesn't return anything
MATCH p=(a)-[r:FOLLOWS]->(b:Person)
WITH a,count(b) AS pop
WHERE pop > 1
RETURN a,pop;
I think I'm missing some key piece of underlying information about how WITH and count() works.