Create 1 node for use during CSV loading

massung · September 24, 2018, 8:24pm

Simply put, I have a Spark job that's run, the output of which is a CSV. Currently, I load this CSV into Neo4j without problems. But, I'd like to do one more step: add a single node representing information about the job run to the graph that will link to every row of the CSV (the output).

Currently, my Cypher query looks like so:

using periodic commit
load csv with headers from $url as line

merge (n:Output {name: r.name})
on create set ...
on match set ...

What I'd like to do is add the one new node so I can additionally create a relationship. For example (which fails, obviously):

create (job:Job {
  time: timestamp(),
  source: $hdfsLocation,
})

using periodic commit
load csv ...

merge (job)-[:PRODUCED]->(n)

I've tried variations of the above to no avail. Maybe I'm just missing a comma or something?

In case it comes up as a possible solution: I don't have anything (currently) that uniquely identifies the "job". The HDFS location - for example - is used many times over with different arguments, so I don't want to overwrite an existing job using the same source script. I could potentially create 2 queries: first create the job node, then load the CSV, but I'm unsure how to get (non-unique) job node from the first and into the second?

Thanks in advance!

Benoit · September 25, 2018, 11:45am

Hi,

For this the best is to create two statements :

Create the Job node, and retrieve the node's ID : create (job:Job { time: timestamp(), source: $hdfsLocation, }) RETURN id(job) AS id
Then load your CSV file like this :

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM $url AS line
  MATCH (n) WHERE id(n) = $id
  MERGE (n:Output {name: line.name })
    ON CREATE SET ...
    ON MATCH SET...
  MERGE (job)-[:PRODUCED]->(n)

massung · September 25, 2018, 12:39pm

This was my first follow-up idea as well, but looking online, it's highly suggested by the Neo4j team to avoid using ID (which surprises me, given that the function is exposed).

Thanks for the idea and example. I'll likely end up doing that if no other solution presents itself.

Benoit · September 25, 2018, 1:09pm

You must not use the technical ID of nodes as a business key, Neo4j reuse the IDs.

When a node is created, it receives an ID, and this one will be the same during all its life.
But, if you delete the node 44, and just after you create a new node, the new node can obtains the id 44.

So inside a transaction (or in your use-case) you can use the nodes ID.

Topic		Replies	Views
Load data from CSV and connect nodes to a single parent node General migrated	1	155	June 27, 2022
Multiple LOAD CSV operations creating "duplicate" nodes Newbie Questions import	7	1786	August 10, 2020
Loading multiple CSV files in Neo4j with each row of each CSV being a node Import / Export load-csv	7	5870	February 24, 2022
LOAD CSV taking time Import / Export cypher , import	6	668	September 25, 2021
Tutorial: Import Relational Data Into Neo4j Neo4j Website	0	796	August 5, 2020

Create 1 node for use during CSV loading

Related Topics