Python loading script creates duplicate nodes when creating relationships using 'MERGE'

pborah88 · February 25, 2020, 4:55pm

I'm using the following script to create nodes and relationships via the neo4j python driver-

import pandas as pd
from neo4j import GraphDatabase

pd.set_option('display.max_colwidth', -1)

# neo4j credentials
uri= "bolt://localhost:7687"
userName= "neo4j"
password= "abcd1234"


#Read file
df = pd.read_csv('C://Users/Pari/Documents/NextPathway/Expedia/lineage_stored_procedure_dedup.csv', 
                 sep=',', index_col=None, header=0,usecols=[0,1,2,3,4,5])

#Connect to the neo4j database server
graphDB_Driver  = GraphDatabase.driver(uri, auth=(userName, password))

#CREATE NODES (:Program{Parent_Procedure}) set property 'type' = Parent_Object_Type 

with graphDB_Driver.session() as graphDB_Session:
    for index, row in df.iterrows():
        cq = 'merge (p:Program{programName:"'+row["Parent_Procedure"]+'"}) set p.type = "'+row["Parent_Object_Type"]+'"'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)
#CREATE NODES (:Program{Called_Procedure}) set property 'type' = Called_Object_Type 
    for index, row in df.iterrows():
        cq = 'merge (p:Program{programName:"'+row["Called_Procedure"]+'"}) set p.type = "'+row["Called_Object_Type"]+'"'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)
        
#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
    for index, row in df.iterrows():
        cq = 'merge (:Program{programName:"'+row["Parent_Procedure"]+'"})-[:CALLS_TO]->(:Program{programName:"'+row["Called_Procedure"]+'"})'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)        
        
graphDB_Driver.close()

The entire script runs without throwing any errors.

The first two node creation blocks work perfectly. There 391 unique Program nodes in the csv and that's how many get created in the graph.

Problem arises when creating the relationships. Not sure how exactly the query gets passed on to neo4j but somehow it blows up from 391 Program nodes to 4000+.

Insights and help appreciated.

pborah88 · February 25, 2020, 5:54pm

Well that was silly, I obviously have to match first before creating a relationship.

#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
    for index, row in df.iterrows():
        cq3 = 'match (p1:Program{programName:"'+row["Parent_Procedure"]+'"}) match (p2:Program{programName:"'+row["Called_Procedure"]+'"}) merge (p1)-[:CALLS_TO]->(p2)'
#Execute the Cypher query
        res3 = graphDB_Session.run(cq3)
        print(res3)

Topic		Replies	Views
Multiple relationships between two nodes all become duplicates Neo4j Graph Platform relationship	3	3493	March 28, 2019
Creating 2 unique nodes per row from a CSV Cypher cypher	1	290	January 24, 2022
Create relationship between two existing nodes without introducing new nodes Cypher performance , cypher , import	3	8948	December 15, 2019
Duplicated (gray) nodes when creating new relationship Newbie Questions	2	680	June 22, 2020
Operation to generate a unique node + relationship from existing node information extremely slow Cypher performance	0	189	March 2, 2023

August 🏄 🏖️ 🏊

Python loading script creates duplicate nodes when creating relationships using 'MERGE'

Related topics