Do I need to load the CSV files twice for creating relationships?

lingvisa · April 21, 2020, 6:11pm

For the northwind CSV loading example, it seems that it first creats the nodes by reading from CSV file once: Tutorial: Import Relational Data Into Neo4j - Developer Guides

// Create orders
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
  ON CREATE SET order.shipName = row.ShipName;

// Create products
LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID})
  ON CREATE SET product.productName = row.ProductName, product.unitPrice = toFloat(row.UnitPrice);

After loading all csv files, it starts to create relationships by loading CSVs again:

/ Create relationships between orders and products
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[op:CONTAINS]->(product)
  ON CREATE SET op.unitPrice = toFloat(row.UnitPrice), op.quantity = toFloat(row.Quantity);

// Create relationships between orders and employees
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);

Is this the normal way to building a graph from CSV files? It would be great if it only needs to load once. Just curious.

ameyasoft · April 21, 2020, 6:58pm

Yes, you can perform all operations in one load. There may be some performance issues.

intouch_vivek · April 22, 2020, 8:20am

Hi,

When you are loading transaction data (OLTP) it will be based on the application, combination of Node and Relationship and then Node etc.

However when you are loading OLAP system for reporting then better to load all the dimensions(nodes) and then fact (relationships)

lingvisa · April 22, 2020, 8:40am

Can you be more specific on this:

based on the application, combination of Node and Relationship and then Node etc

intouch_vivek · April 22, 2020, 9:00am

One thing is very much clear that existence of relationship is based on the presence of relevant Nodes. However it is not mandatory to have all Nodes need to available and then execute the relationship.
And same is done in the example you have mentioned.

lingvisa · April 22, 2020, 4:52pm

Let's say if I have table like this:

product.csv
`productID, productName, productPrice, OrderID`

In order to create relationships between Product and Order, since both Order and Product nodes needs to be created first and Order won't be available since Order info is stored in Order.csv, nodes have to be created first, and then relationships in loading csv.

On the other hand, if the csv is this:
product_order.csv
productID, productName, productPrice, orderID, orderAmount, orderAddress

Since both Order and Product are in the same line, both nodes and relationships can be created at the same time, which is different from the first case.

Can you explain 'When you are loading transaction data (OLTP) it will be based on the application, combination of Node and Relationship and then Node etc' with regards to the two cases above? OLTP loading is very different from command-line tool loading?

intouch_vivek · April 22, 2020, 6:24pm

Case 1: Product.csv, Order.csv-->
Case 2: Product_Order.csv-->
In both the above mention case it is always good practice to load Node first and then load relationship. However in several cases that will be bit costly(case1) as same file needs to be traversed few extra times.

OLTP insertion comes from the application you do not load via csv, until unless you load offline. And that insertion is application specific. But ultimately we cannot create relationship until unless we have related Nodes.

ameyasoft · April 22, 2020, 9:44pm

We are importing data from Oracle database and loading into Neo4j with nodes, relationships and some other data all in one go. Also we do incremental updates.

lingvisa · April 22, 2020, 9:55pm

@ameyasoft Can you share how do you organize the nodes and relationships and how to manage loading in one go?

lingvisa · April 22, 2020, 10:36pm

I am in a loop to load each CSV file, and for each file, I am calling this 'update_nodes' function to merge nodes and properties. The 'property_dict' stores all property key-value pairs of each node. Since all the nodes, property keys and values are variables, I have to concatenate strings to produce the cypher queries. This works, but it seems quite slow, since it has to set each property individually in a loop. I saw the cypher parameters and the apoc dynamic creation of nodes or properties, but doesn't think they work in my case.

@classmethod
    def update_nodes(cls, tx, property_dict, label):
        for key, value in property_dict.items():
            cypher_command = "MERGE (n: " + label + "{ID: '" + property_dict[
                'ID'] + "'}) ON MATCH SET n." + key + " = \"" + value + "\""
            tx.run(cypher_command)

Is there a way to optimize this code to make it much faster?

ameyasoft · April 23, 2020, 4:53pm

Create a view in Oracle that generates say customer id, customer name, order id, ....
and used Neo4j driver and java to import into Neo4j db.

lingvisa · April 23, 2020, 11:08pm

Is there a way to optimize this code anyway? @neo4j Team

@classmethod
    def update_nodes(cls, tx, property_dict, label):
        for key, value in property_dict.items():
            cypher_command = "MERGE (n: " + label + "{ID: '" + property_dict[
                'ID'] + "'}) ON MATCH SET n." + key + " = \"" + value + "\""
            tx.run(cypher_command)

Topic		Replies	Views
Create relationship from CSV on existing nodes Neo4j Graph Platform	6	19099	November 4, 2020
Are these code obsolete in the documentation for data loading? Neo4j Graph Platform	5	527	April 19, 2020
Loading multiple CSV files in Neo4j with each row of each CSV being a node Import / Export load-csv	7	6157	February 24, 2022
Load CSV - Loading relationships in one column, but same type of node Cypher	7	818	November 24, 2020
Create relationship from different CSV Neo4j Graph Platform cypher , load-csv , relationship , import	2	98	December 16, 2024

Get Certified in June!

Do I need to load the CSV files twice for creating relationships?

Related topics