How to PERIODIC COMMIT when importing data from large Pandas Dataframe?

apanza · August 29, 2018, 2:05am

I'd like to ask a follow up question to the one posed on Stackoverflow here.

@lyonwj notes in his answer that "We can batch multiple queries into one transaction for better performance... Typically we can batch ~20k database operations in a single transaction. "

For convenience, I have pasted the example code below:

tx = graph.begin()
for index, row in df.iterrows():
    tx.evaluate('''
      MATCH (a:Label1 {property:$label1))
      MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2))
    ''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()

Well, what if the Pandas dataframe had much more than 20,000 rows? Suppose 10 million. I know that if we are using LOAD_CSV directly from the cypher-shell, we would include PERIODIC COMMIT 20000 to make it commit every 20000 lines of the CSV.

What would be the equivalent of using PERIODIC COMMIT 20000 for importing from a large dataframe and using py2neo?

The py2neo docs mention an optional autocommit argument to make each individual transaction automatically commit (almost the opposite of what I want). But I don't see anything about specifying PERIODIC COMMIT.

The closest I can think of to a workaround is, within the iterrows loop, is to just do a modulo operation on the row variable. And keep the final commit() outside the loop. So the modified code would look something like this:

tx = graph.begin()
for index, row in df.iterrows():
    tx.evaluate('''
      MATCH (a:Label1 {property:$label1))
      MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2))
    ''', parameters = {'label1': row['label1'], 'label2': row['label2']})
    if row % 20000 == 0:
        tx.commit()
tx.commit()

Would this be a viable workaround? Is there any other way?

Topic		Replies	Views
Apoc.periodic.commit within python/pandas help Neo4j Graph Platform migrated	1	206	July 23, 2022
Getting error:Executing queries that use periodic commit in an open transaction is not possible by "USING periodic commit" in 4.1.3 neo4j Cypher apoc , cypher , neo4j-desktop	1	1319	October 22, 2020
Using neo4j-import with existing database in Neo4j Desktop Import / Export	7	2588	September 24, 2018
Improving data writing efficiency in python Cypher cypher	7	2125	April 12, 2020
Struggling with apoc.periodic.iterate in a big Query from python code Cypher apoc , cypher , apocperiodiciterate	12	5300	May 8, 2019

Free Online Global Conference

How to PERIODIC COMMIT when importing data from large Pandas Dataframe?

Related topics