New merged nodes from existing nodes

bologna3 · February 24, 2025, 6:24pm

Hi all,

Neo4j beginner here. This seems like it should be a simple task, but I really cannot figure it out. I want to create a new set of nodes based off an existing set, merging them based on a unique value for a certain variable.

I have 3.7 million nodes of type "person" successfully loaded into my database. They have several variables attached. I want to create a new set of nodes called "name," where there is one node per unique value of the variable "person_name_cluster_key" in the "person" nodes. (Based on working with the data in R, I know that this should result in 2.3 million "name" nodes). I also want to bring over another variable called "person_name" for each of the new "name" nodes. Of the multiple "person" nodes merged, I don't care which "person" node this value is take from. Then, I want to relate each new "name" node to the original "person" nodes with a (n:name)-[:name_of]->(p:person) relationship.

I need the process to be iterative and computationally efficient since the dataset is so large. I feel like this should be really simple, but I'm stumped.

Thanks so much.

glilienfield · February 25, 2025, 2:11am

You can try something like this. Let's see how it works.

Test data:

unwind [["santa","a"], ["pluto","b"], ["goofy","c"], ["rudolph","d"], ["micky","e"], ["rudolph","f"], ["pluto","g"], ["minnie","h"], ["micky","i"]] as person
create (:Person{person_name_cluster_key: person[0], person_name: person[1]})

Query:

:auto
CALL () {
    MATCH(p:Person) 
    WITH p.person_name_cluster_key as key, collect(p) as persons_for_key
    CREATE (n:Name{person_name_cluster_key: key, person_name: head(persons_for_key).person_name})
    FOREACH(i in persons_for_key |
        MERGE (n)-[:name_of]->(i) 
    )
} in CONCURRENT TRANSACTIONS of 100000 ROWS

Note: 1 you need the ":auto" when executing the query in the browser or cypher-shell, but not otherwise.

Note 2: You definitely need an indexes on the labels and properties you are matching and merging on. This is not an issue with this query, as you are not matching on a specify property.

Note 3: I have not used the CONCURRENT TRANSACTIONS clause. It is relatively new.

Note 4: This may require a lot of memory due to the size of your database and the use of a COLLECT on your entire data set.

Let me know how it goes.

Topic		Replies	Views
Can we merge multiple Nodes based on Single common property Browser cypher	12	12993	March 4, 2020
Merging two nodes running endlessly Neo4j Graph Platform migrated	2	82	September 21, 2022
Merge all nodes with the same property name Cypher	14	13476	January 9, 2021
apoc.refactor.mergeNodes Performance Neo4j Graph Platform apoc , performance , migrated , mergenode	6	343	November 28, 2022
Using neo4j module and/or apoc to merge large number of nodes Import / Export	6	99	October 22, 2024

July Summer Fun!

New merged nodes from existing nodes

Related topics