Is there a way to remove only one of the two nodes?

lingvisa · March 27, 2021, 12:00am

MATCH (n:Product)
MATCH (m:Product)
WHERE n.name = m.name AND NOT n.id = m.id
DETACH DELETE n

I have nodes with same names but created with different custom ids, which shouldn't happen. I want to find those nodes and delete one of the two nodes. Each such pair of nodes should be the same and I only want to keep one of them. My above command deletes both, because the m, n can occur in either ends. Is there a way to modify this and only delete one?

koji · March 27, 2021, 2:47am

@lingvisa

This is the data.

CREATE (:Product {id:0, name: 'A'}),
       (:Product {id:1, name: 'A'}),
       (:Product {id:2, name: 'B'}),
       (:Product {id:3, name: 'B'}),
       (:Product {id:4, name: 'C'});

This Cypher will erase all IDs except the first one.
This code may not be elegant, but it will work correctly.

MATCH (n:Product)
MATCH (m:Product)
  WHERE n.name = m.name
  AND NOT n.id = m.id
WITH n.name AS name, collect(n.id)[0] AS firstNodeId
MATCH (n:Product)
  WHERE n.name = name
  AND n.id <> firstNodeId
DETACH DELETE n;

tard_gabriel · March 27, 2021, 2:53am

NOT TESTED

First we retrieve all the distinct names in the database

MATCH (n)
WITH DISTINCT n.name AS name

Second we match each group of nodes corresponding to a name who has duplicates and we delete these nodes.

MATCH (n {name:name}) WHERE count(n) > 1
WITH n SKIP 1
DETACH DELETE n

These two statements must be part of the query when you paste them in your Neo4j Desktop. A Neo4j APOC function certainly exists for that purpose, these are generally much more short and efficient but less human friendly to write and read.

lingvisa · March 27, 2021, 5:48pm

@tard_gabriel I tried, but seems not working:

MATCH (n)
WITH DISTINCT n.name AS Name
MATCH (n {name:Name}) WHERE count(n) > 1
WITH n SKIP 1
DETACH DELETE n

Invalid use of aggregating function count(...) in this context (line 3, column 29 (offset: 67))
"MATCH (n {name:Name}) WHERE count(n) > 1"

andy_hegedus · March 27, 2021, 6:34pm

Just to chime in.

Could the original cypher query be tweaked a bit

to test the order of the id which would prevent both combinations from being true:

MATCH (n:Product)
MATCH (m:Product)
WHERE n.name = m.name AND n.id > m.id
DETACH DELETE n

Andy

tard_gabriel · March 27, 2021, 7:35pm

TESTED

It's the most short, sweet and pretty solution I could come up with.
I think the DISTINCT operator is optional in this case but not sure.

MATCH (n)
WITH DISTINCT n.name AS name, collect(n) AS nodes
FOREACH (n IN tail(nodes) | DETACH DELETE n )

Keep in mind that the best solution is always to avoid creating duplicates by using the constraints before importing or creating any data.

By the way, message intended for the APOC developers, would be great to have a function to remove duplicates based on a node or relationship value an not only the whole thing.

*tail means taking every element in a list except the first one
*collect create a list from all the matching nodes in this case

If you have enjoyed this solution, please check the solution box, this would help me to provide more solutions in the future.

Topic		Replies	Views
Search for all relationships of a node given the node id Cypher cypher	2	411	November 20, 2023
A small question about deleting specific group without deleteing all the other nodes of the graph Neo4j Graph Platform migrated	5	72	July 24, 2022
Delete duplicate node checking if specific keys have same values Cypher	4	5514	November 16, 2019
BASIC: Delete a single relationship between two nodes Cypher cypher , operations , relationship	13	920	October 7, 2020
Delete duplicate data and restore relationship Cypher cypher	2	1780	March 17, 2020

July Summer Fun!

Is there a way to remove only one of the two nodes?

Related topics