I'm using the following script to create nodes and relationships via the neo4j python driver-
import pandas as pd
from neo4j import GraphDatabase
pd.set_option('display.max_colwidth', -1)
# neo4j credentials
uri= "bolt://localhost:7687"
userName= "neo4j"
password= "abcd1234"
#Read file
df = pd.read_csv('C://Users/Pari/Documents/NextPathway/Expedia/lineage_stored_procedure_dedup.csv',
sep=',', index_col=None, header=0,usecols=[0,1,2,3,4,5])
#Connect to the neo4j database server
graphDB_Driver = GraphDatabase.driver(uri, auth=(userName, password))
#CREATE NODES (:Program{Parent_Procedure}) set property 'type' = Parent_Object_Type
with graphDB_Driver.session() as graphDB_Session:
for index, row in df.iterrows():
cq = 'merge (p:Program{programName:"'+row["Parent_Procedure"]+'"}) set p.type = "'+row["Parent_Object_Type"]+'"'
#Execute the Cypher query
res = graphDB_Session.run(cq)
print(res)
#CREATE NODES (:Program{Called_Procedure}) set property 'type' = Called_Object_Type
for index, row in df.iterrows():
cq = 'merge (p:Program{programName:"'+row["Called_Procedure"]+'"}) set p.type = "'+row["Called_Object_Type"]+'"'
#Execute the Cypher query
res = graphDB_Session.run(cq)
print(res)
#Create relationship - (Parent_Procedure)-[:CALLS_TO]->(Called_Procedure)
for index, row in df.iterrows():
cq = 'merge (:Program{programName:"'+row["Parent_Procedure"]+'"})-[:CALLS_TO]->(:Program{programName:"'+row["Called_Procedure"]+'"})'
#Execute the Cypher query
res = graphDB_Session.run(cq)
print(res)
graphDB_Driver.close()
The entire script runs without throwing any errors.
The first two node creation blocks work perfectly. There 391 unique Program
nodes in the csv and that's how many get created in the graph.
Problem arises when creating the relationships. Not sure how exactly the query gets passed on to neo4j but somehow it blows up from 391 Program
nodes to 4000+.
Insights and help appreciated.