Cypher Syntax: Filtering Nodes in WHERE clause

I am trying to simplify a working query, but am getting unexpected results.

In my example graph, I have Users and Articles. The only relationship is (u:User) -[:HAS_READ]->(a:Article) . Users and Articles have Names.

I am trying to build a "weak tie" query that returns the Name of an article that has a given "distance" (measured by number of Users in the path) from a starting User. In a simple case of a "distance" of 1, an Article 2 is returned if User 1 has read Article 1, which User 2 had also read. User 2, in this case, has also read Article 2, which User 1 has not read.

I have a long query already working for distances of 2 and 3, but I want to try to simplify it.

Here is my latest query for distance=2 that does not return an error (but isn't working as expected):

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NOT (u2)-[:HAS_READ]-(a3)

The problem: depending on the path, a User can be in the list of u2 Users and u3 Users, so I am getting Articles that have been read by a u2 User.

I know how to do this with collect and WITH (using a very long-winded query), but is there a way to filter out Users who are in both u2 and u3 collections in the WHERE clause? When I try using collect in the WHERE clause, I get an error.

Here is an example query that returns an error:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NONE (x in collect(u2) WHERE x in collect(u3))

The error: "Invalid use of aggregating function collect(...) in this context"

I am probably just missing some obvious Cypher concept or simple syntax (I hope), or I'm trying to simplify something which cannot be simplified...

Just for kicks, here is a working (long) query for distance = 2:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]-(a1:Article)-[:HAS_READ]-(u2:User)-[:HAS_READ]-(a2:Article)
WHERE NOT (u1)-[:HAS_READ]-(a2)
WITH u1,collect(u2) as u2_list,a2
MATCH (a2)-[:HAS_READ]-(u3:User)-[:HAS_READ]-(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3)
WITH a2,a3,u2_list,collect(u3) as u3_list
MATCH (a2)-[:HAS_READ]-(u3)-[:HAS_READ]-(a3)
WHERE NONE (x in u3_list WHERE x in u2_list)

I created a sample graph that matches the model you described:

  (`0` :User {name:'User1'}) ,
  (`1` :Article {title:'Article1'}) ,
  (`2` :User {name:'User2'}) ,
  (`3` :Article {title:'Article2'}) ,
  (`4` :User {name:'User3'}) ,
  (`5` :Article {title:'Article3'}) ,
  (`6` :User {name:'User4'}) ,
  (`0`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`5`),
  (`6`)-[:`HAS_READ` ]->(`3`),
  (`6`)-[:`HAS_READ` ]->(`1`)

And then I think the following query gets the same results as your last one:

MATCH  (u1:User {name: "User1"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE not((u1)-[:HAS_READ]-(a3))
AND not((u3)-[:HAS_READ]->()<-[:HAS_READ]-(u1))

The logic is different but I think it does the same thing. We're making sure that u3 doesn't have a distance=1 connection to u1 via that 2nd part of the WHERE clause.

1 Like

Thanks Mark! This definitely points me in the right direction. There are some additional subtleties, such as an Article of distance = 2 that is also read by a user of distance =1, but I can play around with it. Thanks for the tip on the latter part of the WHERE clause. That's definitely useful syntax.

Just a quick minor thing to note, NOT is a keyword, not a function, so try to avoid using not(). Prefer something like this instead:

WHERE NOT (u1)-[:HAS_READ]-(a3)

You can of course use parenthesis for doing grouping of boolean expressions when needed, which can look similar, but just wanted clarify the syntax.