Data Import Parquet

shanelanan · May 22, 2023, 9:17pm

Could you please add support for parquet files when performing Data Imports via the shell? Modern ETL stacks do not use CSV in our warehouses. Our entire ETLs are in Spark-like distributed frameworks in parquet formats.

Parquet has an embedded schema, column wise selection, smaller on disk, supports partitioning and warehousing via DeltaLake. It is superior in every way. Our stack has to have a special Neo4j CSV escape character class to just to dump our entire warehouse to disk as CSV for this shell tool.

Please extend this tool below to include Parquet. It can not be that complex, since other ETL tools like Spark are Java based already like Neo4j.

michael.hunger · May 23, 2023, 7:37am

The next format that we're looking at will be JSON and then relational databases.

Parquet files will be a nice addition in the future, for now we need to ask you to convert them to CSV.

Btw. the tool is fully written in JS not Java, it talks to Neo4j via the JS driver.

But we're planning to add parquet import to the apoc procedure library, so you'll be able to use it from there, I had forgotten to create an issue but here it is now:

michael.hunger · May 24, 2023, 3:42pm

You can also import parquet with the Neo4j Arrows Flight connector that comes with the GraphDataScience library and is also enabled on AuraDS.

shanelanan · May 30, 2023, 5:37pm

I am surprised JSON and relational databases are in demand for high-volume imports. Our data ingest far exceeds what is practical to store in JSON or relational formats. Only big data parquet / deltalake warehouses suffice. I don't know anyone in the Big Data space using JSON or relational.

Your shell loader seems to handle CSV efficiently, however, it takes huge run times just to convert parquet / deltalake to CSV just to load into Neo.

Would you consider open-sourcing your JS shell? The community could then extend your shell for the file formats as required.

michael.hunger · May 30, 2023, 11:59pm

Actually it's not a JS shell, it's a bespoke, high performance CSV importer.

It's part of the OSS neo4j code, and parquet and cloud storage support is on the plan.

In the meantime:
To speed up your parquet -> CSV transformation you could use duckdb which has great support for delta lake format conversions.

Just curious what is the size/shape of the files you are trying to import?

Topic		Replies	Views
Parquet files Neo4j Graph Platform migrated	1	171	October 21, 2022
Parquet files Import / Export	3	1976	February 17, 2021
java.lang.NullPointerException when trying to admin import parquets Import / Export import , neo4j-admin	0	30	May 19, 2025
APOC import parquet Java NullPointerException Neo4j Graph Platform	0	134	December 4, 2023
New in Neo4j AuraDB: Direct Import From Cloud Data Warehouses Community Content & Blogs databricks , auradb , snowflake , data-warehouse	1	43	April 2, 2025

August 🏄 🏖️ 🏊

Data Import Parquet

Related topics