Tutorial: Import Relational Data Into Neo4j

After we have exported our data from PostgreSQL, we will use Cypher’s LOAD CSV command to transform the contents of the CSV file into a graph structure. First, we will likely want to place our CSV files in an easily-accessed directory. With Neo4j Desktop, we can place them in the local database import directory (detailed instructions found in our desktop import guide). This way, we can use the file:/// prefix in our Cypher statements to locate the files. We can also place the files in another local or remote directory (supports HTTPS, HTTP, and FTP) and specify the full path in our Cypher statements. Since we are using Neo4j Desktop in this example, we will use the import folder for the database and the path for our CSV files can start with the file:/// prefix.

Now that we have our files where we can access them easily, we can use Cypher’s LOAD CSV command to read each file and add Cypher statements after it to take the row/column data and transform it to the graph.

The entire Cypher script is available on Github for you to copy and run, but we will step through each section below to explain what each piece of the script is doing.

// Create orders
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
  ON CREATE SET order.shipName = row.ShipName;
// Create products
LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID})
  ON CREATE SET product.productName = row.ProductName, product.unitPrice = toFloat(row.UnitPrice);
// Create suppliers
LOAD CSV WITH HEADERS FROM 'file:///suppliers.csv' AS row
MERGE (supplier:Supplier {supplierID: row.SupplierID})
  ON CREATE SET supplier.companyName = row.CompanyName;
// Create employees
LOAD CSV WITH HEADERS FROM 'file:///employees.csv' AS row
MERGE (e:Employee {employeeID:row.EmployeeID})
  ON CREATE SET e.firstName = row.FirstName, e.lastName = row.LastName, e.title = row.Title;
// Create categories
LOAD CSV WITH HEADERS FROM 'file:///categories.csv' AS row
MERGE (c:Category {categoryID: row.CategoryID})
  ON CREATE SET c.categoryName = row.CategoryName, c.description = row.Description;

You might notice that we have not imported all of the field columns in our CSV file. With our statements, we can choose which properties are needed on a node, which can be left out, and which might need imported to another node type or relationship. You might also notice that we used the MERGE keyword, instead of CREATE. Though we feel pretty confident there are no duplicates in our CSV files, we can use MERGE as good practice for ensuring unique entities in our database.

For very large commercial or enterprise datasets, you may find out-of-memory errors, especially on smaller machines. To avoid these situations, you can prefix the statement with the USING PERIODIC COMMIT query hint to commit data in batches. This practice is not standard recommendation for smaller datasets, but is only recommended when memory issues are threatened. More information on this query hint can be found in the manual.

After the nodes are created, we need to create the relationships between them. Importing the relationships will mean looking up the nodes we just created and adding a relationship between those existing entities. To ensure the lookup of nodes is optimized, we will want to create indexes for any node properties we want to use in the lookups (often the id or another unique value).

We also want to create a constraint (also creates an index with it) that will disallow orders with the same id from getting created, preventing duplicates. Finally, as the indexes are created after the nodes are inserted, their population happens asynchronously, so we use the schema await (a shell command) to block until they are populated.

CREATE INDEX product_id FOR (p:Product) ON (p.productID);
CREATE INDEX product_name FOR (p:Product) ON (p.productName);
CREATE INDEX supplier_id FOR (s:Supplier) ON (s.supplierID);
CREATE INDEX employee_id FOR (e:Employee) ON (e.employeeID);
CREATE INDEX category_id FOR (c:Category) ON (c.categoryID);
CREATE CONSTRAINT order_id ON (o:Order) ASSERT o.orderID IS UNIQUE;
schema await

Initial nodes and indexes in place, we can now create the relationships for orders to products and orders to employees.

// Create relationships between orders and products
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[op:CONTAINS]->(product)
  ON CREATE SET op.unitPrice = toFloat(row.UnitPrice), op.quantity = toFloat(row.Quantity);
// Create relationships between orders and employees
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);

Next, create relationships between products, suppliers, and categories:

// Create relationships between products and suppliers
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);
// Create relationships between products and categories
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);

Lastly, we will create the 'REPORTS_TO' relationship between employees to represent the reporting structure:

// Create relationships between employees (reporting hierarchy)
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);

You can also run the whole script at once using bin/neo4j-shell -path northwind.db -file import_csv.cypher.

We can now query the resulting graph to find out what it can tell us about our newly-imported data.


This is a companion discussion topic for the original entry at https://neo4j.com/developer/guide-importing-data-and-etl/