Find the duplicate nodes

In the sample graph, a few movies have received reviews. This is indicated by a “REVIEWED” relationship between a movie node and a person node. Write a query to find the actors that have played in at least two movies that received at least one review. For each of these actors, return the actor’s name and the title of each movie with at least one review that this actor has played in. Two actors satisfy this condition: one has played in two movies while the other has played in three movies.

I thought it be interesting to figure out how to make it more complicated by requiring the movie to be reviewed by at least 2 people as well. I think this should work. The query now groups the person and movie to get the total number of reviewers for that movie the person acted in. The query then filters out all movies that did not have at least two reviewers. The rest of the query is the same as above.

match(p:Person)-[:ACTED_IN]-(m:Movie)<-[:REVIEWED]-(r:Person)
with p, m, collect(distinct r.name) as reviewers
where size(reviewers) > 1
with p, collect(distinct m.title) as titles
where size(titles) > 1
return p.name, titles

The query matches the person to the movies they acted in and to people that reviewed the movie. It a match is found, that means at least one person reviewed the movie, satisfying the second requirement. Next, the query collects the distinct list of movie titles. Distinct is needed, as multiple people could have reviewed the same movie, resulting in multiple rows with the same movie. Finally, the query filters on the persons that have more than one movie in this list, satisfying the first requirement. Hope this helps.

match(p:Person)-[:ACTED_IN]-(m:Movie)<-[:REVIEWED]-(:Person)
with p, collect(distinct m.title) as titles
where size(titles) > 1
return p.name, titles

Screen Shot 2022-10-05 at 2.57.22 PM.png

The data best is the default database in Neo4j

Thanks it works
(migrated from khoros post Solved: Re: find the duplicate nodes - Neo4j - 60945)