Neo4j Data Pipeline

IllusiveByte · April 16, 2023, 8:41pm

Hey everyone! I've been working on a project to find a way to upload a lot of data into Neo4j and modify the data on its way in. I did not find a solution out there so I decided to make my own. The project as it stands can take in large json and csv data files and dynamically add nodes/relationships. The nodes and relationships are defined in a mapping file and allow you to break apart the data in a lot of different ways. You can even include some basic conditional logic!

I wanted to see if there would be anyone interested in using such a tool. My team and I are considering open sourcing the project soon.

Looking for any feedback or questions!

patrickguillerm · May 5, 2023, 7:28pm

Hi,
Your project Can be very intersting. Have you some metrics about importing speed rate?

Best regards

IllusiveByte · May 6, 2023, 12:10pm

So we currently haven’t done to much performance testing yet but that will be a priority in the near future! What kind of metrics would you be interested in?

patrickguillerm · May 6, 2023, 1:05pm

The most important metrics should be number nodes writing per second and same for relationships.

On inugami-project-analysis-maven-plugin-parent I write a lot of nodes when the plugin scan a project (800 nodes for one basic spring boot application , and 1300 relationships, with bigger application it can be much more). The analyze phase is very fast. Most time is loose on writing result into Neo4J, so a good example of massive nodes importation can be helpful.

IllusiveByte · May 6, 2023, 2:41pm

Thank you! Yeah we can definitely look into getting those metrics. I’m sure we have a lot of room for optimization but we will keep working on it!

pedro2 · July 28, 2023, 11:34pm

Hello, I am working on a project where I am facing a data ingestion issue in my project.

I'm very interested!
Does it work for aura instances?

Topic		Replies	Views
Neo4j Import tools slow ingestion Import / Export import , neo4j-import , neo4j	1	528	April 8, 2022
Importing data after 40 million nodes is very slow by apoc Import / Export apoc , import	6	1741	August 10, 2019
Upload mass data in neo4j Import / Export	9	188	July 4, 2024
Importing Relationships / Nodes very slow Import / Export performance , cypher , import	3	1089	March 5, 2020
Creating a lot of nodes at once Python	2	798	May 7, 2020

August Summer Fun!

Neo4j Data Pipeline

Related topics