I'm trying to import from CSV format for my database. Ideally, I want the incoming CSV information to be all within one file. However, some properties are shared between, what I want to be, different node types.
However, the data rows which have an entry in column E need to be nodes of a different type and so shouldn't be included in the collection. My code is as below:
LOAD CSV WITH HEADERS FROM "file:/Plastering.csv" AS line
CREATE (Pr:Process {id: line.Step Number, name: line.Step Name})
WITH Pr order by Pr.id ASC
WITH collect(Pr) as Pr1
CALL apoc.nodes.link(Pr1, 'NEXT')
RETURN Pr1
I've been also struggling with this while importing.
I think it would be nice if Neo4J created examples for this, as there is a lot of power to be had in having conditionals used in the IMPORT. (I've been comparing Cypher with Gremlin and Gremlin appears to only import CSV and not do be able to do anything interesting with the import.)
I did find there is WHEN and CALL apoc.when (I'm not too sure of the difference):
There's also CASE and CALL apoc.do.case:
The puzzle I'm trying to figure out, is how to conditonal match existing NODEs (including not matching any) and create new relationships (or not) based on what's being loaded from the CSV.
I also found this:
MATCH (c:Jaguar{name:"JLR 2.5Ltr"})-[:REPRESENTED_BY]->(v)
RETURN CASE WHEN c IS NULL THEN false ELSE true END as c
Here is my solution that worked for me to import data from a .csv file with forty columns and some null values for some columns. And, each row had data to create eight nodes. This successfully worked in production!
LOAD CSV WITH HEADERS FROM "file:/Plastering.csv" AS line
FOREACH(ignoreMe IN CASE WHEN line.Input IS NULL AND line.StepName IS NOT NULL THEN [1] ELSE [] END|
CREATE (Pr:Process {id: line.StepNumber, name: line.StepName})
)
FOREACH(ignoreMe IN CASE WHEN line.Input IS NOT NULL AND line.StepName IS NULL THEN [1] ELSE [] END|
CREATE (Pr:Process {id: line.StepNumber, name: line.Input)
)
Ha, this is neat. Very easy to read and I'm pretty sure it's doing what I need, but there is one little syntax error that's stopping me from actually outputting the graphic. When I add the query to it, like this:
LOAD CSV WITH HEADERS FROM "file:/Plastering.csv" AS line
FOREACH(ignoreMe IN CASE WHEN line.Input IS NULL AND line.Step Name IS NOT NULL THEN [1] ELSE END|
CREATE (Pr:ProcessStep {id: line.Step Number, name: line.Step Name}))
FOREACH(ignoreMe IN CASE WHEN line.Input IS NOT NULL AND line.Step Name IS NULL THEN [1] ELSE END|
CREATE (In:Input {id: line.Step Number, name: line.Input}))
WITH Pr order by Pr.id ASC
WITH collect(Pr) as Pr1
CALL apoc.nodes.link(Pr1, 'NEXT')
RETURN Pr1
...it tells me 'Variable Pr not defined'. Any idea why that would be?
There is a caveat in using FOREACH: The variables will not be propagated and are contained within the loop. Also, while loading the data it's better not to RETURN any values. Complete the import process and then query on the database. If you want to check just run couple of rows and check the results.
After LOAD statement, add WITH line LIMIT 2. This will import only two rows of data.