How to delete duplicate relationships after applying Node Similarity Algorithm

luyilun32661 · March 10, 2021, 3:42am

After running node similarity algorithm, I created the SIMILAR_TO relationship between two nodes. Since the node similarity algorithm will always produce two way relationships between the node pairs, how do I write cypher to keep only one relationship and delete the other ? The image below is an example of the two-way relationships generated by the node similarity algorithm.

alicia_frame1 · March 13, 2021, 3:46pm

You could do something like

MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2

just one note - node similarity doens't necessarily create similarity relationships that are symmetric; if you have topN set, you may end up with a relationship in one direction, but not the other - but this cypher would only pick up the bidirectional relationships.

luyilun32661 · March 15, 2021, 1:38am

Thank you Alicia ! This is going to be great help to my project :)

bengt.ellison · April 28, 2021, 1:23pm

I have the same issue. and the problem with your query is that it will return result twice and delete both relations. I don't have a solution yet. I have tried quite a few paths now but i can not find the right path forward.

this I have tested, and no one with the wanted result. some of them remove all, some remove most of the relations.

MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)

CALL apoc.periodic.iterate(
  "MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a) RETURN r",
  "DELETE r",
  {batchMode: "SINGLE", parallel:false})

MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a)
WITH a, b, r.score AS score, COLLECT(r)[1..] AS unwanted
FOREACH(x IN unwanted | delete x)

I start with a graph that looks like this:

I then use one of the proposed algorithms

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)

and this is the results is not as expected. For some nodes all relationships has been removed, for some nodes all is intact and some nodes have as i want it to be, only one relation

bengt.ellison · April 29, 2021, 5:03am

Now i see a typo in my previous post, it says
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
but should be
MATCH (s)-[r:SIMILAR_TO]->(n), (s)<-[r2:SIMILAR_TO]-(n)

will test it and come back with my findings

bengt.ellison · April 29, 2021, 5:29am

Now nothing happens with that query that i had a typo in before.

The query below removes 277 relationships of total 392 where all is symmetric so it should only be 196 that should be removed. Why does it some time remove both relationships and sometime not.

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)

result in query, needed to add the 8610083 to the query to show that it had no relations

lingvisa · October 23, 2021, 12:57am

This removed most relationships, not double links! Not a solution

lingvisa · October 23, 2021, 12:58am

This seems to be working:

MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)

lingvisa · October 23, 2021, 1:00am

Is there a way to configure the similarity algorithms to not create double links, so that it doesn't need to be deduplicated afterwards? For large graph, these double links slow down and increase memory usage.

florentin_dorre · December 15, 2021, 2:37pm

For clarification, did you use the topN or topK parameter?

Because for topK the result is not symmetric anymore.

Topic		Replies	Views
Remove bidirectional edges between two nodes Graph Algorithms/Graph Data Science	5	408	September 8, 2023
Delete duplicate data and restore relationship Cypher cypher	2	1723	March 17, 2020
Remove nodes duplicates and replace removed relationships with new one, with same properties values Newbie Questions	3	869	February 7, 2021
Cannot delete node<id>, because it still has relationships. To delete this node, you must first delete its relationships Cypher apoc	5	2399	August 30, 2021
3.5.14 nodeSimilarity create a single relationship between two nodes Graph Algorithms/Graph Data Science cypher	8	603	March 25, 2020

Submit Your Talk by June 15

How to delete duplicate relationships after applying Node Similarity Algorithm

Related topics