Select nodes in a subgraph which don't have relationships outside of this subgraph

I will take an example with the movies database. The idea is to select a movie (Matrix) and to get all the actors who acted in, except the ones who acted in other movies.
I can achieve this with the query

MATCH (m:Movie)--(p:Person)
MATCH (p)--(m0:Movie) 
WHERE m.title = 'The Matrix' 
WITH m ,p, collect(m0) as other_movies
WHERE NOT ANY (x in other_movies WHERE x<> m)
RETURN m, p 

I use a ANY statement to filter out all the movies which have another relationship. Now the idea is to generalize to the case where I have selected several movies initially. For example if I select also "the Replacements" I want the actor "Keanu Reeves" to be also selected and linked to both movies. But this is not working with my current proposition, as I only get the actors related to one signle movie only

Do you have any advice on how to generalize this with performance ? I guess I should have a look into the APOC library and subgraphs but I don't know where to starts with - I don't know what I am looking for in Graph theory. Any help appreciated, thank you :slight_smile:

Hello @matthieu :slight_smile:

I'm not sure if I have understood your need but this query will return Person nodes who ACTED_IN only in one movie:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p.name AS name, collect(m.title) AS movies
WHERE size(movies) = 1
RETURN name, movies

Regards,
Cobra

Thank you for your heads up. It 'is not exactly what I am looking for, as it matches the first example but not the last one To describe a bit more the second example. I would like to retrieve people which acted in 'The Matrix' or in 'The Replacement's, or in both movies.

Currently, my second query covers the first 2 cases but not the last one. It gives me the people who acted in 'The Matrix' Only Or 'The Replacements' ONLY but I would like to have people who acted in both.
Here is the query btw

MATCH (m:Movie)--(p:Person)
MATCH (p)--(m0:Movie) 
WHERE m.title = 'The Matrix' OR m.title = 'The Replacements' 
WITH m ,p, collect(m0) as other_movies
WHERE NOT ANY (x in other_movies WHERE x<> m)
RETURN m, p 

Another way to formulate this is to take all the people connected to this 2 movies, and remove the ones that are connected to another movie.

Let me know if it is clearer

Regards,
Mattthieu

The following query should give you the actors that acted in only the given movies. They can act in less, but the movies have to be in the list.

with ['Matrix', 'The Replacements'] as allowedMovies
match(p:Person)-[:ACTED_IN]->(m:Movie)
with p, collect(m) as movies, allowedMovies
where all(i in movies where i.title in allowedMovies)
return p as actor, movies

If you want exactly those movies, no more, no less, then you can an additional constraint:

with ['Matrix', 'The Replacements'] as allowedMovies
match(p:Person)-[:ACTED_IN]->(m:Movie)
with p, collect(m) as movies, allowedMovies
where size(movies) = size(allowedMovies)
and all(i in movies where i.title in allowedMovies)
return p as actor, movies

is this what you are looking for?

1 Like

Hi @matthieu !

One silly question. Who is the person who acted on 'The Matrix' and 'The Replacements' and no other? You state that Keanu Reeves was one of them but it's not true tho.

Bennu

ps: This is my version of what I understood is your desired query.

profile  WITH ['The Matrix', 'The Replacements'] as allowedMovies
MATCH(p)-[:ACTED_IN]->(m:Movie)
WHERE m.title in allowedMovies
with p, collect(m) as movies
MATCH(p)
WHERE not exists {
    MATCH(p)-[:ACTED_IN]->(s)
    where not s in movies
}
return p
1 Like

Thank you for your support @bennu.neo and @glilienfield even though my example was ill-coined. You are right Keanu Reeves also played in other movies, I just hadn't noticed as I used the GUI node expansion and thought it was comprehensive. Anyway your query match what I am looking for, thank you very much!

EDIT: After implementing on real case, the answer that fits well is the one from @bennu.neo as the filtering is done on the Persons and not on the movies

Hey, no worries. It isn't a contest. We all learn from each other by sharing ideas. I can see the benefit of @bennu.neo approach, as it finds the relevant actors first, then filters out the ones that don't match. As such, it probably is faster on larger data sets. If speed is an issue, maybe a little tweak may make if more efficient. The change eliminates the need to interrogate each movie node.

WITH ['The Matrix', 'The Replacements'] as allowedMovies
MATCH(p)-[:ACTED_IN]->(m:Movie)
WHERE m.title in allowedMovies
with p, collect(m) as movies
where size((p)-[:ACTED_IN]->(:Movies)) = size(movies)
return p

Hey @glilienfield !

You gave me a new idea for a query. This one should use less db hits and memory overall.

profile WITH ['The Matrix', 'The Replacements'] as allowedMovies
MATCH(p)-[:ACTED_IN]->(m:Movie)
WHERE m.title in allowedMovies
with p, count(m) as s
where size((p)-[:ACTED_IN]->()) = s
return p

Bennu

Good tweak. Why collect when we just need the size.

It wasn't my intention to set up a contest, but I am glad of the outcome ! The new query seems really good, thanks to both of you!

1 Like