cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Delete duplicate relations

mohsensoodkhah
Node Link

I have one graph with user nodes and follows relations.
there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar :

START_ID, END_ID
1 , 2
1 , 3
1 , 4
2 , 1
2 , 5
4 , 3

this csv file has 3,000,000 lines. my cypher take long time. can I write cypher that was faster?
my cypher is:
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
match (a:User)-[f:FOLLOWS]->(b:User) where a.id=toInt(line["START_ID"]) and b.id=toInt(line["END_ID"]) with collect(f) as rels
where size(rels) > 1
unwind tail(rels) as t
delete t

also graph has index on node ids.

9 REPLIES 9

mike_r_black
Ninja
Ninja

You can probably get rid of the toInt() in the query so you're not doing function calls within your match query. If you suspect you have nodes using inconsistent data types, correct it first before your duplicate clean up operation. You may want to also consider use apoc.periodic.commit() to commit batches of completed work.

LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line

MATCH (a:User)-[f:FOLLOWS]->(b:User) 
WHERE a.id = line["START_ID"] 
	AND b.id = line["END_ID"] 

WITH collect(f) AS rels
WHERE size(rels) > 1
UNWIND tail(rels) as t
DELETE t

Hello Mike,

Do you know if tail is performed first or is the unwind first performed?

I just used your code to solve the same issue I had, and am trying to understand how Neo4j does the order of things.

Here is how I repurposed it

MATCH (t:Toy)-[rel:SOLD_BY]->(s:Supplier)
WITH COLLECT(rel) AS RELS,t
WHERE SIZE(RELS) >1
UNWIND tail(RELS) as reltail
DELETE reltail

have a general question about the SIZE() function. It seems that is size gets a list from collect that is a nested list, size seems to first unwind the list and then counts what is in each row.

Am I seeing this correct?

I can't find an explanation no where in the manual about this. The manual fustrates me with this type of information. The stuff that it is doing in the smart ways.

Thanks in advance,
Jeffrey

clem
Graph Steward

Here's the documentation.

The explanation here is a bit skimpy though.... and doesn't explain your question. It should be improved.

Hello Clem,

Thank you for the reply. I read that entry in the manual, and that is why I came here hoping someone would have a better insight into my question. This is exactly my point the manual is so scares of information.

Is there someone from the company on the forum or is there a support department that I can contact concerning the manual. because the manual needs to be fixed. I have been reading it since version 3 and it hasn't gotten better.

I use this to delete duplicate relationships between two types (labels) of nodes, but I think you could just use 2 aliases of the same node (label):

MATCH (:Location)-[r:LOC_IN_DIV]-(:Division)
WITH collect(r) as rels
WHERE size(rels) > 1
CALL apoc.refactor.mergeRelationships(rels) YIELD rel
RETURN COUNT(*)

12kunal34
Graph Fellow

Hi @mohsensoodkhah

there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar

could you please explain this with example so we can suggest you better

I find this solution:
I list end ids in relation file and then sort uniq them in follower-end.csv. then run this query:

End_DI
1
2
3
4
5

load csv WITH HEADERS from 'file:///tmp/follower-end.csv' as line with toInt(line["END_ID"]) as e_id
match (s:User)-[f:FOLLOWS]->(e:User) where e.id =e_id with e_id, s.id as s_id, collect(f) as rels where size(rels) > 1 unwind tail(rels) as t delete t

Hi I am trying to deduplicate relations with attributes
How to do that keeping the relations based on the unique attributes
Thanks in advance