LOAD CSV is great for importing small- or medium-sized data (up to 10M records).
For data sets larger than this, we have access to a command line bulk importer.
The neo4j-admin import tool allows you to import CSV data to an empty database by specifying node files and relationship files.
We want to use it to import order data into Neo4j: customers, orders, and ordered products.
The tool is located in <neo4j-home>/bin/neo4j-admin and is used as follows:
bin/neo4j-admin import --id-type=STRING \
--nodes:Customer=customers.csv --nodes=products.csv \
--nodes="orders_header.csv,orders1.csv,orders2.csv" \
--relationships:CONTAINS=order_details.csv \
--relationships:ORDERED="customer_orders_header.csv,orders1.csv,orders2.csv"
The first few rows of data used for this import look like this:
The repeated --nodes and --relationships parameters are groups of multiple (potentially split) CSV files of the same entity, i.e. with the same column structure.
All files per group are treated as if they could be concatenated as a single large file. A header row in the first file of the group or in a separate, single-line file is required. Placing the header in a separate file can make it easier to handle and edit than having it in a multi-gigabyte text file. Compressed files are also supported.
-
The
--id-type=STRINGindicates that all:IDcolumns contain alphanumeric values (there is an optimization for numeric-only IDs). -
The
customers.csvis imported directly as nodes with the:Customerlabel and the properties are taken directly from the file. -
Productnodes follow the same pattern where the node-labels are taken from the:LABELcolumn. -
The
Ordernodes are taken from 3 files - one header and two content files. -
Line item relationships typed
:CONTAINSare created fromorder_details.csv, relating orders with the contained products via their IDs. -
Orders are connected to customers by using the order CSV files again, but this time with a different header, which :IGNORE’s the non-relevant columns.
The column names are used for property-names of your nodes and relationships. There is specific markup on specific columns, which we will explain.
-
name:ID- global id column used to look up the node later reconnecting.-
if the property name is left off, it will be not stored (temporary), which is what the
--id-typerefers to. -
if you have repeated IDs across entities, you have to provide the entity (id-group) in parentheses like
:ID(Order). -
if your IDs are globally unique, you can leave that off.
-
-
:LABEL- label column for nodes. Multiple labels can be separated by delimiter. -
:START_ID,:END_ID- relationship file columns referring to the node ids. For id-groups, use:END_ID(Order). -
:TYPE- column to specify relationship-type. -
All other columns are treated as properties but skipped if empty or annotated with
:IGNORE. -
Type conversion is possible by suffixing the name with indicators like
:INT,:BOOLEAN, etc.
For more details on this header format and the tool, see the documentation in the Neo4j Manual and the accompanying tutorial.
This is a companion discussion topic for the original entry at https://neo4j.com/developer/guide-import-csv/