Recently, I have had to import some data from a poorly designed schema. I had a CSV file where the rows were duplicated EXCEPT one column (Mixed
) was alternating between strings and floats.
With Cypher, I had a conditional so that if the string in Mixed
started with a digit, then do a MERGE on the row data with a separate property name aFloat
for a numeric value and using toFloat()
to convert the string to a numeric. If the string didn't start with a digit, then do a MERGE, but with a different property name for the string.
In another case, I had a CSV file, but one column was an Action. The actions were add
, remove
, and change
. So, when the action was add
, I created a new Node with the data from the rest of the row. When the action was remove
, I matched and then removed the match. When it was change
, I changed some of the properties of an existing node (after doing a match.)
As I was a newbie, I did it in three passes instead of doing a conditional.
The other thing I do, is make a dry run where I assemble the data and RETURN the data to make sure I got it right before actually running it. So, I'd do this first:
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row FIELDTERMINATOR ','
WITH row.Action AS action, row.Category AS catName
WHERE action = 'Remove'
MATCH (c:Category {Name:catName})
// DETACH DELETE (c)
RETURN action, c.Name, c.otherProperty // return info about all the nodes to be deleted.
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row FIELDTERMINATOR ','
WITH row.Action AS action, row.Category AS catName
WHERE action = 'Remove'
MATCH (c:Category {Name:catName})
DETACH DELETE (c) // do the actual DELETE
// RETURN action, c.Name, c.otherProperty
If you're situation is really complicated, you could always use Python (or any other language that there is a driver for) to massage the data before calling executing the Cypher code you want.
I hope that helps.