Performing match intersection

andrew_bowman · August 23, 2018, 2:42am

Match intersection is a common use case where you're searching for nodes which have relationships to all of a set of input nodes.

For the rest of the article we'll use the built-in movies graph for demonstration. The example use case will be:

Given a list of actor names, find movies featuring all the given actors.

A common first (and wrong) approach is something like this:

WITH ['Keanu Reeves', 'Hugo Weaving', 'Emil Eifrem'] as names
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name in names
RETURN m

The above query returns movies featuring at least one of the given actors, not movies with all the given actors.

We need alternate approaches to get the correct results.

Filter to the nodes in common by the count of input nodes in the match

We can make some changes to the above query to get the relevant set of :Movie nodes.

The idea here is that in our match, the :Movie nodes we want will have the same number of distinct matched actors as the size of our input collection.

WITH ['Keanu Reeves', 'Hugo Weaving', 'Emil Eifrem'] as names
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name in names
WITH m, size(names) as inputCnt, count(DISTINCT p) as cnt
WHERE cnt = inputCnt
RETURN m

By filtering down based upon the distinct nodes matched, we're able to get the correct answer even if there are multiple relationships of the same type between the nodes in our match.

This is typically the most efficient approach for finding match intersections, but it requires that all inputs in the input list are distinct.

Use WHERE ALL() to ensure all nodes in a list have a relationship to another node

An alternate (but typically less efficient) approach is to collect input nodes, and use the WHERE ALL() predicate to ensure all the collected nodes have a relationship to the common node.

WITH ['Keanu Reeves', 'Hugo Weaving', 'Emil Eifrem'] as names
MATCH (p:Person)
WHERE p.name in names
WITH collect(p) as persons
MATCH (m:Movie)
WHERE ALL(p in persons WHERE (p)-[:ACTED_IN]->(m))
RETURN m

The problem with this approach is that there is a performance hit proportional to the number of :Movie nodes, since the last MATCH starts from all :Movie nodes.

We can improve this query somewhat by starting from :Movie nodes matched from one of our input nodes, though this increases the complexity of the query:

WITH ['Keanu Reeves', 'Hugo Weaving', 'Emil Eifrem'] as names
MATCH (p:Person)
WHERE p.name in names
WITH collect(p) as persons
WITH head(persons) as head, tail(persons) as persons
MATCH (head)-[:ACTED_IN]->(m:Movie)
WHERE ALL(p in persons WHERE (p)-[:ACTED_IN]->(m))
RETURN m

Use APOC to intersect result lists

Some use cases might introduce extra complexity. Perhaps we want result nodes to intersect with results from another matched pattern, or with a different collection of results.
In these cases, we might simply use a WHERE clause to enforce the additional pattern, or list membership.

But when there are several lists requiring intersection, this can become tougher to perform with Cypher alone.

When using Neo4j 3.0.x or higher, using the reduce() function along with an intersection function from APOC Procedures, we can perform intersections across multiple lists.

WITH ['Keanu Reeves', 'Hugo Weaving', 'Emil Eifrem'] as names
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name in names
WITH p, collect(m) as moviesPerActor
WITH collect(moviesPerActor) as movies
WITH reduce(commonMovies = head(movies), movie in tail(movies) |
 apoc.coll.intersection(commonMovies, movie)) as commonMovies
RETURN commonMovies

Topic		Replies	Views
Know more on Intersection Cyphers Cypher browser , cypher , knowledge-base	4	3189	November 27, 2018
Help with intersecting nodes query Cypher apoc , cypher	5	496	March 18, 2021
Filter nodes by properity list using intersection Cypher apoc , cypher	7	292	January 4, 2024
Check whether a node exists, which has relation to all given nodes Cypher	2	2428	October 7, 2019
Want to find nodes that have exact set of relationships Cypher	3	448	January 8, 2020

Performing match intersection

Filter to the nodes in common by the count of input nodes in the match

Use WHERE ALL() to ensure all nodes in a list have a relationship to another node

Use APOC to intersect result lists

Related topics