Hello,
I'm trying to create or update nodes from a batch using neo4j python module,
I have approx 100k items in the batch.
Here is 1 item from the batch:
batch[0] = {'contactid': '1', 'gender': 'Mr', 'firstname': 'Marc', 'lastname': 'Brown', 'customerid': 'abc123', 'password': 'cbz', 'salutation': 'Marc', 'organizationid': '20', 'companyid': '100003.0', 'eipuserid': nan, 'email_address': 'xyz@hotmail.com', 'email2': nan, 'url': nan, 'academictitle': nan, 'jobtitle': 'Director'}
from neo4j import GraphDatabase
driver = GraphDatabase.driver(uri, auth=(username, password))
def create_update_nodes(insert_batch,label,label_id):
query = """
CALL apoc.periodic.iterate(
'UNWIND $batch as row RETURN row',
'MERGE (n:{label} {{{label_id}:row.{label_id}}})
ON MATCH SET n+= row
ON CREATE SET n += row ',
{{ batchSize : 5000 , iterateList:true , params:{{batch: $batch}} }})
""".format(label=label,label_id=label_id , batch = insert_batch)
start_time = time.time()
with driver.session() as session:
result = session.run(query,batch = insert_batch)
print(label+ f" node insertion took - {time.time() - start_time} ")
return None
i have created constraints too.
The first 20-25k insertion is quite fast, but post that the insertion becomes very slow , i have tried with different batch sizes ranging from 500 to 25000 , seems faster in the 1000 to 5000 batch size range.
Where am I doing wrong within the function ?
Thanks.