Cypher Syntax: Filtering Nodes in WHERE clause

pat.canny · November 25, 2019, 2:46am

I am trying to simplify a working query, but am getting unexpected results.

In my example graph, I have Users and Articles. The only relationship is (u:User) -[:HAS_READ]->(a:Article) . Users and Articles have Names.

I am trying to build a "weak tie" query that returns the Name of an article that has a given "distance" (measured by number of Users in the path) from a starting User. In a simple case of a "distance" of 1, an Article 2 is returned if User 1 has read Article 1, which User 2 had also read. User 2, in this case, has also read Article 2, which User 1 has not read.

I have a long query already working for distances of 2 and 3, but I want to try to simplify it.

Here is my latest query for distance=2 that does not return an error (but isn't working as expected):

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NOT (u2)-[:HAS_READ]-(a3)
RETURN DISTINCT a3.Name

The problem: depending on the path, a User can be in the list of u2 Users and u3 Users, so I am getting Articles that have been read by a u2 User.

I know how to do this with collect and WITH (using a very long-winded query), but is there a way to filter out Users who are in both u2 and u3 collections in the WHERE clause? When I try using collect in the WHERE clause, I get an error.

Here is an example query that returns an error:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NONE (x in collect(u2) WHERE x in collect(u3))
RETURN DISTINCT a3.Name

The error: "Invalid use of aggregating function collect(...) in this context"

I am probably just missing some obvious Cypher concept or simple syntax (I hope), or I'm trying to simplify something which cannot be simplified...

Just for kicks, here is a working (long) query for distance = 2:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]-(a1:Article)-[:HAS_READ]-(u2:User)-[:HAS_READ]-(a2:Article)
WHERE NOT (u1)-[:HAS_READ]-(a2)
WITH u1,collect(u2) as u2_list,a2
MATCH (a2)-[:HAS_READ]-(u3:User)-[:HAS_READ]-(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3)
WITH a2,a3,u2_list,collect(u3) as u3_list
MATCH (a2)-[:HAS_READ]-(u3)-[:HAS_READ]-(a3)
WHERE NONE (x in u3_list WHERE x in u2_list)
RETURN DISTINCT a3.Name

mark.needham · November 25, 2019, 2:08pm

I created a sample graph that matches the model you described:

CREATE
  (`0` :User {name:'User1'}) ,
  (`1` :Article {title:'Article1'}) ,
  (`2` :User {name:'User2'}) ,
  (`3` :Article {title:'Article2'}) ,
  (`4` :User {name:'User3'}) ,
  (`5` :Article {title:'Article3'}) ,
  (`6` :User {name:'User4'}) ,
  (`0`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`5`),
  (`6`)-[:`HAS_READ` ]->(`3`),
  (`6`)-[:`HAS_READ` ]->(`1`)

And then I think the following query gets the same results as your last one:

MATCH  (u1:User {name: "User1"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE not((u1)-[:HAS_READ]-(a3))
AND not((u3)-[:HAS_READ]->()<-[:HAS_READ]-(u1))
RETURN DISTINCT a3.title

The logic is different but I think it does the same thing. We're making sure that u3 doesn't have a distance=1 connection to u1 via that 2nd part of the WHERE clause.

pat.canny · November 26, 2019, 2:08am

Thanks Mark! This definitely points me in the right direction. There are some additional subtleties, such as an Article of distance = 2 that is also read by a user of distance =1, but I can play around with it. Thanks for the tip on the latter part of the WHERE clause. That's definitely useful syntax.

andrew_bowman · December 3, 2019, 10:17am

Just a quick minor thing to note, NOT is a keyword, not a function, so try to avoid using not(). Prefer something like this instead:

WHERE NOT (u1)-[:HAS_READ]-(a3)

You can of course use parenthesis for doing grouping of boolean expressions when needed, which can look similar, but just wanted clarify the syntax.

Topic		Replies	Views
Finding Nodes that do not share any paths through a grandparent node Cypher	13	816	February 24, 2020
Filtering Cypher Results Neo4j Graph Platform cypher	11	761	December 11, 2019
Simplify/optimizing my query (removing UNION) Cypher optimization , cypher , union	3	259	March 22, 2023
WHERE clause with Variable Length Paths Cypher	3	261	December 13, 2023
Using WHERE to Filter Queries Question 3 Answers Graph Academy & Certifications cypher	2	220	November 8, 2021

Cypher Syntax: Filtering Nodes in WHERE clause

Related topics