Simply put, I have a Spark job that's run, the output of which is a CSV. Currently, I load this CSV into Neo4j without problems. But, I'd like to do one more step: add a single node representing information about the job run to the graph that will link to every row of the CSV (the output).
Currently, my Cypher query looks like so:
using periodic commit
load csv with headers from $url as line
merge (n:Output {name: r.name})
on create set ...
on match set ...
What I'd like to do is add the one new node so I can additionally create a relationship. For example (which fails, obviously):
create (job:Job {
time: timestamp(),
source: $hdfsLocation,
})
using periodic commit
load csv ...
merge (job)-[:PRODUCED]->(n)
I've tried variations of the above to no avail. Maybe I'm just missing a comma or something?
In case it comes up as a possible solution: I don't have anything (currently) that uniquely identifies the "job". The HDFS location - for example - is used many times over with different arguments, so I don't want to overwrite an existing job using the same source script. I could potentially create 2 queries: first create the job node, then load the CSV, but I'm unsure how to get (non-unique) job node from the first and into the second?
Thanks in advance!