Neo4j returning incorrect number of rows


(Groverjatin17) #1

Hi All,

I am working on 1 Use case where we are merging all nodes(with 1 common property, such as all nodes with year= "2016" in Movie.csv table below) into 1 node where all the relationships are heading towards it rather than 3 different nodes.

For example:-
I have 3 csv files with below type of data.


Actor Id Name
1 Sam


Movie Id Movie Name Year Actor Id Director Id
45 Avengers 2016 1 10
23 Movie 2 2016 1 10
12 Movie 3 2016 1 10


Director ID Director Name
10 Danny Morgan

Now I merged all the Movie nodes into 1 node using apoc.refractor.mergenodes from APOC library.

Now when I request the data as a TABLE (not Graph) like:-

Match (a:Actor)-[:ACTED_IN]->(m:Movie)-[:DIRECTED_BY]->(d:Director)
return a.Name, m.Movie_Name,d.Director_Name

Ideally I should get only 3 rows with above mentioned info.
But I get more rows than I am supposed to.

Please help me identify what is the reason and how to fix it.

(I think the issue is that each actor to movie realtionship is giving cartesian product with other 3 relationships of movie to director)
NOTE:-I am using Neo4j Browser version 3.1.4

Thank you

(Andrew Bowman) #2

You're correct, if the merge resulted in multiple relationships between the same pairs of nodes (so 3 :ACTED_IN relationships between the movie node and same actor, and 3 :DIRECTED_BY relationships between the movie and same director) you'll end up with 9 rows.

We have a kb article on understanding cardinality in Cypher which covers this.

The quick fix is to get DISTINCT nodes before you return:

MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)-[:DIRECTED_BY]->(d:Director) 
RETURN a.Name, m.Movie_Name,d.Director_Name

The better fix is to delete the duplicate relationships, since I don't think you need those in your graph:

MATCH (a:Actor)-[r:ACTED_IN]->(m:Movie)
WITH a, m, tail(collect(r)) as toDelete
WHERE size(toDelete) > 0
UNWIND toDelete
DELETE toDelete

and then a similar one for :DIRECTED_BY.