Data Import Parquet

Could you please add support for parquet files when performing Data Imports via the shell? Modern ETL stacks do not use CSV in our warehouses. Our entire ETLs are in Spark-like distributed frameworks in parquet formats.

Parquet has an embedded schema, column wise selection, smaller on disk, supports partitioning and warehousing via DeltaLake. It is superior in every way. Our stack has to have a special Neo4j CSV escape character class to just to dump our entire warehouse to disk as CSV for this shell tool.

Please extend this tool below to include Parquet. It can not be that complex, since other ETL tools like Spark are Java based already like Neo4j.

The next format that we're looking at will be JSON and then relational databases.

Parquet files will be a nice addition in the future, for now we need to ask you to convert them to CSV.

Btw. the tool is fully written in JS not Java, it talks to Neo4j via the JS driver.

But we're planning to add parquet import to the apoc procedure library, so you'll be able to use it from there, I had forgotten to create an issue but here it is now:

You can also import parquet with the Neo4j Arrows Flight connector that comes with the GraphDataScience library and is also enabled on AuraDS.

I am surprised JSON and relational databases are in demand for high-volume imports. Our data ingest far exceeds what is practical to store in JSON or relational formats. Only big data parquet / deltalake warehouses suffice. I don't know anyone in the Big Data space using JSON or relational.

Your shell loader seems to handle CSV efficiently, however, it takes huge run times just to convert parquet / deltalake to CSV just to load into Neo.

Would you consider open-sourcing your JS shell? The community could then extend your shell for the file formats as required.

Actually it's not a JS shell, it's a bespoke, high performance CSV importer.

It's part of the OSS neo4j code, and parquet and cloud storage support is on the plan.

In the meantime:
To speed up your parquet -> CSV transformation you could use duckdb which has great support for delta lake format conversions.

Just curious what is the size/shape of the files you are trying to import?