Remove nodes duplicates and replace removed relationships with new one, with same properties values

lorenzoz · February 5, 2021, 2:38pm

Hello, I would be really grateful to those who can help me about a Cypher query issue I'm facing these days. I'm sure I'm missing something not so hard, yet have to figure out how to handle it.

The DB has one of the most simple structure, consider only the following two node entities and the relationship between them.

Nodes:

Organization {id: generated, name}
Vendor {id: generated, type)

Relationships:

INVOICE {id: generated, date: String, summary: String)

Note: initially, every Vendor node instance has (for sure!) an incoming relationship which start node is an instance of the Organization node entity (i.e. (o:Organization)-[:INVOICE]->(v:Vendor) ).
What I'm supposed to do: remove every Vendor node which is duplicate and replace the relationships that previously connected such Vendor nodes with new, equal (relationship properties too, this is what I'm trying to achieve), relationships which have as end node the Vendor for such type and as start node the same Organization of the Vendor node removed, for each Vendor node removed in such a way.

MATCH (org:Organization)-[:INVOICE]->(v:Vendor)
WITH v.type as vendorType, collect(v) AS vendorNodes, collect(org) as organizations
WHERE size(vendorNodes) > 1
WITH vendorNodes, organizations
FOREACH (v2 in tail(vendorNodes) | DETACH DELETE v2)
WITH vendorNodes, vendorNodes[0] as firstNode, organizations
FOREACH (o2 in tail(organizations) | CREATE (o2)-[:INVOICE]->(firstNode))

This query correctly removes Vendor duplicates and add a relationship, for each deleted node, so that the DB stores the same informative content as before the query did run.
I tried several ways to "remember" the relationship and then create the relationship with the correct properties values, but I didn't manage to do it correctly.

Hope someone could help, it should not be hard for those who are more expert than me with the Cypher query language.

clem · February 5, 2021, 4:57pm

I think you need to use one of the APOC refactoring functions.

Maybe this one:
https://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/graph-refactoring/merge-nodes/

lorenzoz · February 6, 2021, 10:46am

Thank you for your answer.
Tried mergeNodes, I'm experiencing the following error.

Neo.DatabaseError.Transaction.TransactionCommitFailed

Could not apply the transaction to the store after written to log

I try to figure out why and I'll let you know when I managed to solve it.
The query I'm trying right now is:

MATCH (v:Vendor)
WITH v.type as vendorType, collect(v) as vendorNodes
WHERE size(vendorNodes) > 1
CALL apoc.refactor.mergeNodes(vendorNodes, {properties: "combine", mergeRels: true}) YIELD node
RETURN count(*)

clem · February 7, 2021, 4:52pm

Unfortunately, the documentation for this error code is useless:

I've never seen this before.

:-(

Topic		Replies	Views
Delete duplicate data and restore relationship Cypher cypher	2	1760	March 17, 2020
Remove property duplicates Browser	2	334	March 25, 2020
Delete duplicate nodes if they have a relationship to the same node Cypher	11	308	May 25, 2022
Search for all relationships of a node given the node id Cypher cypher	2	410	November 20, 2023
Deleting / Merging Duplicate Connected Nodes Neo4j Graph Platform migrated	5	276	August 30, 2022

Get Certified in June!

Remove nodes duplicates and replace removed relationships with new one, with same properties values

Related topics