I am writing a tool in python using the neo4j (1.7.6) module that creates cypher CREATE and MERGE statements for data based on a set of rules. There is the potential for hundreds of thousands of statements to be generated and I am wondering what is the most efficient way to push them to the Neo4j database?
Hi Jared, welcome to the community!
From my experience Neo4J is quite able to deal with any amounts of data you throw at it.
I would script it the way it feels the most natural to you and only deviate if needed.
If you find any bumps on your road feel free to share them here...
Are the rules needs to be strictly parsed in Python? If data is in CSV or JSON then you can use this utility to be able to use batching to load data.
Also, if you are planning to generate CREATE/MERGE statements manually then keep these things in mind.
- Make sure you use open transaction and execute a batch of statements and commit. If you have less than 10,000 statements then you might be able to do in a single transaction. If not make sure you adjust the batch size based on the heap/page cache available.
- Make sure you have indexes created for the keys being used in MERGE statement. If there are no indexes as you keep adding more data the performance of MERGE will start going down.
Thanks all for the help!
Right now the rules need to be applied through python and then converted to CREATE/MERGE statements. An open transaction sounds like the direction I need to go, and if the total exceeds 10,000 I can just break it into chunks. Is there any examples of python code using an open transaction to push multiple cypher statements?
Looking at the driver manual I find this:
def add_person(driver, name):
with driver.session() as session:
# Caller for transactional unit of work
return session.write_transaction(create_person_node, name)
# Simple implementation of the unit of work
def create_person_node(tx, name):
return tx.run("CREATE (a:Person {name: $name}) RETURN id(a)", name=name).single().value()
# Alternative implementation, with timeout
@unit_of_work(timeout=0.5)
def create_person_node_within_half_a_second(tx, name):
return tx.run("CREATE (a:Person {name: $name}) RETURN id(a)", name=name).single().value()
In this they just push a single statement as a string. If I need to push multiple statements where would I perform the iteration? or do I create a massive multi-statement string with some sort of statement delimiter?
Here's the documentation that talks about starting transaction
https://neo4j.com/docs/api/python-driver/current/api.html#sessions-transactions
with session.begin_transaction() as tx:
for statement in statements:
tx.run(statement)
tx.commit()
Fantastic, I think that will work for now. Thank you!