Is there a way to load all columns without explicit specification?

lingvisa · October 28, 2020, 9:41pm

LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
  ON CREATE SET order.shipName = row.ShipName;

If the Order has a lot of columns, the CREATE SET statement would be very long. Importantly, each property name and value have to be specified explicitly. If there a way to do the SET all at once without explicitly specifying each one? I can do this by bolt driver without using LOAD CSV. I read the data into a dictionary and then know all the column names and do a loop with the generic form of:
ON CREATE SET $label.propertyName=$property_value

The driver way is slow, but very flexible. For the LOAD CSV, is there a way to load all columns in a similar way? If the only way to access the column names is by something like 'row.OrderID', probably this is not possible.

terryfranklin82 · October 28, 2020, 10:07pm

I've asked myself this question in the past as well, but your description about treating the data as a dictionary made me realise that the row is really the same thing already (or perhaps technically a map). Either way, this should do what you want:

...
ON CREATE SET order += row

lingvisa · October 28, 2020, 10:40pm

This looks cool, and I will try.

mdfrenchman · October 29, 2020, 1:09pm

ON CREATE SET order += row

Is what we've always used and works well.

BONUS info so I'm not just repeating @terryfranklin82

If you want to alter or change some data you can override the values in an object map.
For example if you want all but one of the fields.

ON CREATE SET order += row { .*, columnToIgnore: null }

terryfranklin82 · October 29, 2020, 7:37pm

Thanks, it's a feature that I didn't know about, but could you provide a working example?

If I run something like:

WITH { a:1, b:2, c:3 } as row
MATCH (d:Destination{id:123})
SET d += row { .*, columnToIgnore:row.a}
RETURN d.a, d.b, d.c

all columns are still set:

d.a	d.b	d.c
1	2	3

mdfrenchman · October 30, 2020, 2:23pm

You would need to do.

SET d+= row { .*, a:null }

a|b|c are your object keys for row, by setting it to null, it overrides the a from .* in the map. Neo4j doesn't write null value properties to the graph, so the a property is ignored.

Guess I should add this to the list of blog post ideas.

terryfranklin82 · October 30, 2020, 9:09pm

Now I understand - I thought columnToIgnore was a control word, but you were using it as a placeholder for "the columns you want to ignore".

Please share a link to your blog, I'd be interested to check it out

mdfrenchman · November 2, 2020, 1:37pm

@terryfranklin82 yep, I get pretty short-hand sometimes and it's a known problem that it's not always clear to anyone else what I mean It's something I work on daily lol.

I don't actually have a blog yet, just a ton of post topics/ideas sitting in my notebook! I'll share it once I make one

lingvisa · November 2, 2020, 11:44pm

@mdfrenchman, @terryfranklin82 A follow-up question:

If I want to change the empty string to avoid load, how to do it? I want to do it roughly like this example:

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
MERGE (c:Company {companyId: row.Id})
SET c.emailAddress = CASE trim(row.Email) WHEN "" THEN null ELSE row.Email END

So if I want to change empty string from any field, is that possible for the generic version of the:
SET d+= row { .*, a:null }

mdfrenchman · November 3, 2020, 1:44pm

This would work, potentially creating Companies without an email address.

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
MERGE (c:Company {companyId: row.Id})
SET c += row { .*, emailAddress: CASE trim(row.Email) WHEN "" THEN null ELSE row.Email END }

If you want to avoid loading any part of the row if the email is blank. Then you would need to move the case above the merge.

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row, CASE trim(row.Email) WHEN "" THEN null ELSE row.Email END AS emailAddress
WHERE emailAddress IS NOT NULL
MERGE (c:Company {companyId: row.Id})
SET c += row { .*, emailAddress: emailAddress }

lingvisa · November 5, 2020, 5:00pm

@mdfrenchman, For this last solution, is there a generic solution to omit all node properties that are empty strings? Each node type has different properties, and it would be hard to explicitly check the null/empty of each property. Not only email, any property except ID value could be absent. Currently, my code create empty string property value like this when loading from csv:
"foreign_name": ""

This is my code:

CALL apoc.load.csv('coach.csv') 
yield map as row 
MERGE (m:Coach {ID: row.ID}) ON CREATE SET m += row ON MATCH SET m += row
RETURN count(m) as mcount

How to omit empty string property before this MERGE statement?

Given the example above:
SET c += row { .*, emailAddress: CASE trim(row.Email) WHEN "" THEN null ELSE row.Email END }
I need a way to somehow create a new row which sets empty strings as null, not only for an explicit property, i.e. row.Email. I tried adding a WITH clause before the MERGE, but it reports an error:
WITH = "\nWITH [(key, value) in row WHERE value = '' | row.key=NULL] as row"

lingvisa · November 5, 2020, 6:05pm

I will create a separate question for this follow-up related question.

Topic		Replies	Views
Load data from CSV NO SQL - column name may vary ETL-Tool cypher , import	7	555	September 24, 2020
Load CSV with Column Headers as property and values Cypher	1	1343	February 25, 2019
Loading CSV with Cypher with certain column names as properties Cypher cypher	2	1039	November 26, 2019
Are these code obsolete in the documentation for data loading? Neo4j Graph Platform	5	508	April 19, 2020
Neo4j LOAD CSV..when CSV columns contains a list of properties Neo4j Graph Platform migrated	7	248	November 30, 2022

Is there a way to load all columns without explicit specification?

Related topics