cancel
Showing results for 
Search instead for 
Did you mean: 

apoc.periodic.commit within python/pandas help

FourMoBro
Node Clone

I am looking for the correct syntax to help me load data into Neo4j, in particular using the periodic commit ability when loading from a python/pandas dataframe. My general workflow is as follows:

  1. Within a Jupyter notebook, I load the 1M+ line tab-delimited text file into a dataframe.
  2. Clean the data
  3. Create a smaller dataframe to be used as input parameter for a function
  4. Run function

In general my functions look like this:

def add_data(df1):
query = """
UNWIND $rows as row
MERGE
SET
RETURN COUNT(*) as total
"""
return conn.query(query, parameters = {'rows':df1.to_dict('records')})

columns = []
df1 = pd.DataFrame(df[columns])
df1 = df1.explode(columns).drop_duplicates()
add_data(df1)

This works great for creating nodes and relationships when the total count is under 1000, but when there are 1M+ nodes/relationships, it tends to not finish.

I know there are server parameters in neo4j.conf that can be adjusted which may help with the load. I know I can save the dataframe to csv and load from harddisk USING PERIODIC COMMIT. I know I can split my dataframe and create a for loop and process the loop from within python. But I don't want to go those routes. I want to get apoc.periodic.commit to work within the add_data function.

I have tried several iterations in attempt to get it to work, but to no avail. I am hoping the community can help.

Thanks in advance.

 

1 REPLY 1

bennu_neo
Neo4j
Neo4j

Hi @FourMoBro,

Quick question. How does your Merge statement look? Do you have an index on the properties used?

Regards

Oh, y’all wanted a twist, ey?