Loading data from pandas dataframe into Neo4j using Py2Neo or official neo4j driver

Hi,

Can I use the official neo4j driver to load data from a pandas dataframe into Neo4j on a daily basis? If not, then can I use the py2neo connector to also efficiently execute cypher queries that create nodes and relationships, and/or delete nodes? According to the py2neo docs, it seems like the py2neo driver is the way to go for me when deciding between these two drivers.

I'm about to start loading data from a pandas dataframe into our neo4j database and py2neo seems to be the way to go based on these stackoverflow questions:

I was just curious to know the experiences of neo4j users who have implemented this python driver approach.

Thanks

I can't edit my post above any more. I need to add another question here:

Which of these two python drivers is the better and faster approach to load data into Neo4j?

  1. Loading data using the official Python driver? I would think you have to pass a string in Python that tells the official neo4j driver something like:
query = """
:auto USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS
FROM 'file:///data.csv' AS row
MERGE(p:Person {id: toInteger(row.id)}
""""
  1. Using the py2neo driver and load the data from the pandas dataframe?

I found this Medium's article very interesting, where makes comparisons with the different python Neo4j drivers

But, my experience using the Neo4j's python driver is good, it's easy to use and I didn't found problems with having low data transfer speeds .

1 Like

Thank you! Great article, and this sums it up nicely:

My recommendation? Definitely py2no is not an option . Although it is user-friendly in many respects, it is too slow for counting queries. Neo4jrestclient is not bad, but sometimes it returns nested list structure which we have to deal with using some trick (e.g. “sum(temp,)” which I want to avoid. So I think I would go with the Neo4j Python driver . After all it is the only official release supported by Neo4j. What is your recommendation?

I'll follow up here on this post which driver I ended up using.

Wouldn't it be cool if the official neo4j driver also supported pandas dataframes as a source of data? :thinking:

2 Likes

Pandas is specific, and processing a pandas dataframe in other frameworks maybe not a such optimal as using the power of that library gives to you in python.
So, my answer to this: Not really, because the trend for data types used on webservices and Apps are things like Json (A standard that works everywhere). In other cases, the non-dev usages, the spreadsheets are very common, and the csv appears here, compact and easy to be generated.

Thank you for your insights! I've been learning more about the official Python Neo4j driver and I would say that this is surely the way to go. The documentation is pretty good, and here are two useful articles I found on this subject in case anyone is interested:

Thanks to this driver I have scheduled daily updates to my Neo4j database from multiple sources across my company. Here's to hoping that the Neo4j team continues to update this driver.

Thanks!

2 Likes