I have one graph with user nodes and follows relations.
there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar :
this csv file has 3,000,000 lines. my cypher take long time. can I write cypher that was faster?
my cypher is:
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
match (a:User)-[f:FOLLOWS]->(b:User) where a.id=toInt(line["START_ID"]) and b.id=toInt(line["END_ID"]) with collect(f) as rels
where size(rels) > 1
unwind tail(rels) as t
delete t
You can probably get rid of the toInt() in the query so you're not doing function calls within your match query. If you suspect you have nodes using inconsistent data types, correct it first before your duplicate clean up operation. You may want to also consider use apoc.periodic.commit() to commit batches of completed work.
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
MATCH (a:User)-[f:FOLLOWS]->(b:User)
WHERE a.id = line["START_ID"]
AND b.id = line["END_ID"]
WITH collect(f) AS rels
WHERE size(rels) > 1
UNWIND tail(rels) as t
DELETE t
I find this solution:
I list end ids in relation file and then sort uniq them in follower-end.csv. then run this query:
End_DI
1
2
3
4
5
load csv WITH HEADERS from 'file:///tmp/follower-end.csv' as line with toInt(line["END_ID"]) as e_id
match (s:User)-[f:FOLLOWS]->(e:User) where e.id =e_id with e_id, s.id as s_id, collect(f) as rels where size(rels) > 1 unwind tail(rels) as t delete t
have a general question about the SIZE() function. It seems that is size gets a list from collect that is a nested list, size seems to first unwind the list and then counts what is in each row.
Am I seeing this correct?
I can't find an explanation no where in the manual about this. The manual fustrates me with this type of information. The stuff that it is doing in the smart ways.
Thank you for the reply. I read that entry in the manual, and that is why I came here hoping someone would have a better insight into my question. This is exactly my point the manual is so scares of information.
Is there someone from the company on the forum or is there a support department that I can contact concerning the manual. because the manual needs to be fixed. I have been reading it since version 3 and it hasn't gotten better.
I use this to delete duplicate relationships between two types (labels) of nodes, but I think you could just use 2 aliases of the same node (label):
MATCH (:Location)-[r:LOC_IN_DIV]-(:Division)
WITH collect(r) as rels
WHERE size(rels) > 1
CALL apoc.refactor.mergeRelationships(rels) YIELD rel
RETURN COUNT(*)