cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Is it possible split column values in apoc.load.csv function

lingvisa
Graph Fellow

For example, this is my load query:

CALL apoc.load.csv('test.csv', {nullValues:['','na','NAN',false], sep:'	'}) 
yield map as row
MERGE (m:Test {nid: row.nid})
ON CREATE SET m += row 
ON MATCH SET m += row 
RETURN count(m) as mcount

My data format could be like this:

nid  tag         uid         date
001   c|python|java       1003252452  20210929

The 'tag' column has 3 values and they should be split by '|'. In apoc.load.csv, is it possible to automatically convert and load three records:

nid  tag         uid         date
001   c       1003252452  20210929
002   python       1003252452  20210929
003   java       1003252452  20210929

In my csv, if I first expand them into 3 lines, my CSV becomes very large. Is that possible?

3 REPLIES 3

koji
Ninja
Ninja

Hi @lingvisa

This is my data.

nid tag uid date
001 c|python|java 1003252452 20210929
002 go|c++|javascript 1234567890 20211001

I wasn't sure if "nid" was simply a sequence, so I created the nodes anyway.
6 nodes have been generated from 2 records.

CALL apoc.load.csv('test.csv', {nullValues:['','na','NAN',false], sep:' '}) 
yield map as row
WITH row, split(row.tag, '|') AS tags
UNWIND tags AS onetag
CREATE (m:Test {nid: row.nid})
SET m.tag = onetag
SET m.uid = row.uid
SET m.date = row.date

My case is a little more complicated than the example I presented. I will try your approach. And I need to compare the overall speed impact between:

  1. Pre-split in csv file before it calls apoc
  2. Split in apoc statement.

The 2nd approach can reduce csv files size a lot, and it may also boost loading speed, but makes the loading code more complicated.

Hi @lingvisa

How about "LOAD CSV" instead of "apoc.load.csv"?

LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS row
FIELDTERMINATOR ' '
UNWIND split(row.tag, '|') AS onetag
CREATE (m:Test {nid: row.nid})
SET m.tag = onetag
SET m.uid = row.uid
SET m.date = row.date