I am looking for the correct syntax to help me load data into Neo4j, in particular using the periodic commit ability when loading from a python/pandas dataframe. My general workflow is as follows:
- Within a Jupyter notebook, I load the 1M+ line tab-delimited text file into a dataframe.
- Clean the data
- Create a smaller dataframe to be used as input parameter for a function
- Run function
In general my functions look like this:
def add_data(df1):
query = """
UNWIND $rows as row
MERGE
SET
RETURN COUNT(*) as total
"""
return conn.query(query, parameters = {'rows':df1.to_dict('records')})
columns =
df1 = pd.DataFrame(df[columns])
df1 = df1.explode(columns).drop_duplicates()
add_data(df1)
This works great for creating nodes and relationships when the total count is under 1000, but when there are 1M+ nodes/relationships, it tends to not finish.
I know there are server parameters in neo4j.conf that can be adjusted which may help with the load. I know I can save the dataframe to csv and load from harddisk USING PERIODIC COMMIT. I know I can split my dataframe and create a for loop and process the loop from within python. But I don't want to go those routes. I want to get apoc.periodic.commit to work within the add_data function.
I have tried several iterations in attempt to get it to work, but to no avail. I am hoping the community can help.
Thanks in advance.