Delete duplicate relations

mohsensoodkhah · June 16, 2019, 12:59pm

I have one graph with user nodes and follows relations.
there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar :

START_ID, END_ID
1 , 2
1 , 3
1 , 4
2 , 1
2 , 5
4 , 3

this csv file has 3,000,000 lines. my cypher take long time. can I write cypher that was faster?
my cypher is:
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
match (a:User)-[f:FOLLOWS]->(b:User) where a.id=toInt(line["START_ID"]) and b.id=toInt(line["END_ID"]) with collect(f) as rels
where size(rels) > 1
unwind tail(rels) as t
delete t

also graph has index on node ids.

mike_r_black · June 16, 2019, 7:00pm

You can probably get rid of the toInt() in the query so you're not doing function calls within your match query. If you suspect you have nodes using inconsistent data types, correct it first before your duplicate clean up operation. You may want to also consider use apoc.periodic.commit() to commit batches of completed work.

LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line

MATCH (a:User)-[f:FOLLOWS]->(b:User) 
WHERE a.id = line["START_ID"] 
	AND b.id = line["END_ID"] 

WITH collect(f) AS rels
WHERE size(rels) > 1
UNWIND tail(rels) as t
DELETE t

12kunal34 · June 18, 2019, 6:31am

Hi @mohsensoodkhah

there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar

could you please explain this with example so we can suggest you better

mohsensoodkhah · June 18, 2019, 7:57am

I find this solution:
I list end ids in relation file and then sort uniq them in follower-end.csv. then run this query:

End_DI
1
2
3
4
5

load csv WITH HEADERS from 'file:///tmp/follower-end.csv' as line with toInt(line["END_ID"]) as e_id
match (s:User)-[f:FOLLOWS]->(e:User) where e.id =e_id with e_id, s.id as s_id, collect(f) as rels where size(rels) > 1 unwind tail(rels) as t delete t

tideon · December 6, 2020, 2:05pm

Hello Mike,

Do you know if tail is performed first or is the unwind first performed?

I just used your code to solve the same issue I had, and am trying to understand how Neo4j does the order of things.

Here is how I repurposed it

MATCH (t:Toy)-[rel:SOLD_BY]->(s:Supplier)
WITH COLLECT(rel) AS RELS,t
WHERE SIZE(RELS) >1
UNWIND tail(RELS) as reltail
DELETE reltail

tideon · December 6, 2020, 3:58pm

have a general question about the SIZE() function. It seems that is size gets a list from collect that is a nested list, size seems to first unwind the list and then counts what is in each row.

Am I seeing this correct?

I can't find an explanation no where in the manual about this. The manual fustrates me with this type of information. The stuff that it is doing in the smart ways.

Thanks in advance,
Jeffrey

clem · December 7, 2020, 4:25pm

Here's the documentation.

The explanation here is a bit skimpy though.... and doesn't explain your question. It should be improved.

tideon · December 7, 2020, 4:54pm

Hello Clem,

Thank you for the reply. I read that entry in the manual, and that is why I came here hoping someone would have a better insight into my question. This is exactly my point the manual is so scares of information.

Is there someone from the company on the forum or is there a support department that I can contact concerning the manual. because the manual needs to be fixed. I have been reading it since version 3 and it hasn't gotten better.

charleskoehl · June 6, 2021, 7:41am

I use this to delete duplicate relationships between two types (labels) of nodes, but I think you could just use 2 aliases of the same node (label):

MATCH (:Location)-[r:LOC_IN_DIV]-(:Division)
WITH collect(r) as rels
WHERE size(rels) > 1
CALL apoc.refactor.mergeRelationships(rels) YIELD rel
RETURN COUNT(*)

houssam.razouk92 · October 6, 2021, 2:46pm

Hi I am trying to deduplicate relations with attributes
How to do that keeping the relations based on the unique attributes
Thanks in advance

Topic		Replies	Views
Deleting older relationships Cypher performance	10	799	June 14, 2020
How to delete duplicate relationships after applying Node Similarity Algorithm Graph Algorithms/Graph Data Science	9	1534	December 15, 2021
Cannot delete node<id>, because it still has relationships. To delete this node, you must first delete its relationships Cypher apoc	5	2417	August 30, 2021
Deleting Duplicated Relationships Based on Date Cypher	4	253	February 16, 2023
Optimization of Delete quey Cypher apoc , performance	1	192	November 6, 2023

Get Certified in June!

Delete duplicate relations

Related topics