cancel
Showing results for 
Search instead for 
Did you mean: 

Python loading script creates duplicate nodes when creating relationships using 'MERGE'

pborah88
Node Clone

I'm using the following script to create nodes and relationships via the neo4j python driver-

import pandas as pd
from neo4j import GraphDatabase

pd.set_option('display.max_colwidth', -1)

# neo4j credentials
uri= "bolt://localhost:7687"
userName= "neo4j"
password= "abcd1234"


#Read file
df = pd.read_csv('C://Users/Pari/Documents/NextPathway/Expedia/lineage_stored_procedure_dedup.csv', 
                 sep=',', index_col=None, header=0,usecols=[0,1,2,3,4,5])

#Connect to the neo4j database server
graphDB_Driver  = GraphDatabase.driver(uri, auth=(userName, password))

#CREATE NODES (:Program{Parent_Procedure}) set property 'type' = Parent_Object_Type 

with graphDB_Driver.session() as graphDB_Session:
    for index, row in df.iterrows():
        cq = 'merge (p:Program{programName:"'+row["Parent_Procedure"]+'"}) set p.type = "'+row["Parent_Object_Type"]+'"'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)
#CREATE NODES (:Program{Called_Procedure}) set property 'type' = Called_Object_Type 
    for index, row in df.iterrows():
        cq = 'merge (p:Program{programName:"'+row["Called_Procedure"]+'"}) set p.type = "'+row["Called_Object_Type"]+'"'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)
        
#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
    for index, row in df.iterrows():
        cq = 'merge (:Program{programName:"'+row["Parent_Procedure"]+'"})-[:CALLS_TO]->(:Program{programName:"'+row["Called_Procedure"]+'"})'
#Execute the Cypher query
        res = graphDB_Session.run(cq)
        print(res)        
        
graphDB_Driver.close()

The entire script runs without throwing any errors.

The first two node creation blocks work perfectly. There 391 unique Program nodes in the csv and that's how many get created in the graph.

Problem arises when creating the relationships. Not sure how exactly the query gets passed on to neo4j but somehow it blows up from 391 Program nodes to 4000+.

Insights and help appreciated.

1 ACCEPTED SOLUTION

pborah88
Node Clone

Well that was silly, I obviously have to match first before creating a relationship.

#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
    for index, row in df.iterrows():
        cq3 = 'match (p1:Program{programName:"'+row["Parent_Procedure"]+'"}) match (p2:Program{programName:"'+row["Called_Procedure"]+'"}) merge (p1)-[:CALLS_TO]->(p2)'
#Execute the Cypher query
        res3 = graphDB_Session.run(cq3)
        print(res3)

View solution in original post

1 REPLY 1

pborah88
Node Clone

Well that was silly, I obviously have to match first before creating a relationship.

#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
    for index, row in df.iterrows():
        cq3 = 'match (p1:Program{programName:"'+row["Parent_Procedure"]+'"}) match (p2:Program{programName:"'+row["Called_Procedure"]+'"}) merge (p1)-[:CALLS_TO]->(p2)'
#Execute the Cypher query
        res3 = graphDB_Session.run(cq3)
        print(res3)