After running node similarity algorithm, I created the SIMILAR_TO relationship between two nodes. Since the node similarity algorithm will always produce two way relationships between the node pairs, how do I write cypher to keep only one relationship and delete the other ? The image below is an example of the two-way relationships generated by the node similarity algorithm.
You could do something like
MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2
just one note - node similarity doens't necessarily create similarity relationships that are symmetric; if you have topN set, you may end up with a relationship in one direction, but not the other - but this cypher would only pick up the bidirectional relationships.
Thank you Alicia ! This is going to be great help to my project :)
I have the same issue. and the problem with your query is that it will return result twice and delete both relations. I don't have a solution yet. I have tried quite a few paths now but i can not find the right path forward.
this I have tested, and no one with the wanted result. some of them remove all, some remove most of the relations.
MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll
foreach(x in tail(coll) | delete x)
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)
CALL apoc.periodic.iterate(
"MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a) RETURN r",
"DELETE r",
{batchMode: "SINGLE", parallel:false})
MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a)
WITH a, b, r.score AS score, COLLECT(r)[1..] AS unwanted
FOREACH(x IN unwanted | delete x)
I start with a graph that looks like this:
I then use one of the proposed algorithms
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)
and this is the results is not as expected. For some nodes all relationships has been removed, for some nodes all is intact and some nodes have as i want it to be, only one relation
Now i see a typo in my previous post, it says
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
but should be
MATCH (s)-[r:SIMILAR_TO]->(n), (s)<-[r2:SIMILAR_TO]-(n)
will test it and come back with my findings
Now nothing happens with that query that i had a typo in before.
The query below removes 277 relationships of total 392 where all is symmetric so it should only be 196 that should be removed. Why does it some time remove both relationships and sometime not.
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll
foreach(x in tail(coll) | delete x)
result in query, needed to add the 8610083 to the query to show that it had no relations
This removed most relationships, not double links! Not a solution
This seems to be working:
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll
foreach(x in tail(coll) | delete x)
Is there a way to configure the similarity algorithms to not create double links, so that it doesn't need to be deduplicated afterwards? For large graph, these double links slow down and increase memory usage.
For clarification, did you use the topN
or topK
parameter?
Because for topK
the result is not symmetric anymore.