Splitting multi-valued cells when importing CSV files

mcentellesv · January 28, 2020, 11:32am

Hi,

I want to create a database from a CSV file using LOAD CSV. A column called IDFILM conatins a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p:Pellicula label. Two other columns -DIRECCIO and INTERPRETS- may have a unique value or multiple values in each of their cells (and, sometimes, are empty). So, I want to split cells containing multiple values in order to generate separated nodes. Nodes from the column DIRECCIO will belong to the label d:Director, and nodes from the INTERPRETS column will belong to the label i:Interpret.

And then, I want to relate these isolated nodes from the columns DIRECCIO and INTERPRETS with the nodes from the IDFILM column through two different relationships:

(p:Pellicula)-[:ES_DIRIGIDA_PER]->(d:Director)

(p:Pellicula)-[:ES_INTERPRETADA_PER]->(i)

I've prepared the following query to load and transform CSV data into the Neo4j database, in the way I have explained.

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala, SPLIT(row.DIRECCIO, ",") AS nomDirector, SPLIT(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
MERGE (d:Director {nomDirector: nomDirector})
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;

The point is that orders like "SPLIT(row.DIRECCIO, ",") AS nomDirector" just transform multivalued cells into lists of values, but not into separated nodes. In addition, I have anothe doubt. In some cells, values can be separated by two different separators. For exemple: "a, b, c and d" where delimiters are "," and "and".

Thank you in advance.

Miquel Centelles

pilar.saez · January 28, 2020, 1:29pm

Hi Miquel,

one suggestion would be using something like this:
UNWIND nomDirectorList AS nomDirector
before MERGE

mcentellesv · January 28, 2020, 3:15pm

Thank you very much, Pilar.

I've been working around your suggestion of using the clause UNWIND to expand the list of values into a sequence of rows. In Importing CSV Data into Neo4j Importing CSV Data into Neo4j - Developer Guides, there is an example of Cypher query to do it:

//split string of employee skills into separate nodes
LOAD CSV FROM 'file:///data.csv' AS row
MERGE (e:Employee {employeeId: row.Id})
UNWIND split(row.skills, ',') AS skill
MERGE (s:Skill {name: skill})
MERGE (e)-[r:HAS_EXPERIENCE]->(s);

The point is that I don't find out how to insert this clause in my query. It always returns errors like "WITH is required between SET and UNWIND".

Miquel Centelles

pilar.saez · January 28, 2020, 3:34pm

Would you try runing this sentence:

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector
MERGE (d:Director {nomDirector: nomDirector})
UNWIND SPLIT(row.INTERPRETS, ",") AS nomInterpret
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;

mcentellesv · January 28, 2020, 3:48pm

Many thanks. It returns the error message:

Error. Neo.ClientError.Statement.SyntaxError

WITH is required between SET and UNWIND (line 5, column 1 (offset: 749))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"

pilar.saez · January 28, 2020, 4:16pm

May you can suppress SET statment and put all the attributes inside the CREATE clause?

mcentellesv · January 28, 2020, 8:53pm

It keeps on error:

WITH is required between CREATE and UNWIND (line 4, column 1 (offset: 156))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"
^

ameyasoft · January 28, 2020, 11:43pm

Try this:

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH
row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,
row.VERSIO AS versioPellicula,
row.SINOPSI AS sinopsi,
row.ORIGINAL AS titolOriginal,
row.QUALIFICACIO AS qualificacio,
datetime(row.ANY) AS anyProduccio,
row.IDFILM AS pelliculaId,
row.CARTELL AS cartell,
row.ESTRENA AS dataEstrena,
row.TRAILER AS trailer,
row.TITOL AS titolCatala,
SPLIT(row.DIRECCIO, ",") AS nomDirector,
SPLIT(row.INTERPRETS, ",") AS nomInterpret

CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala

with p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomDirector)-1) as i
MERGE (d:Director {normDirector: nomDirector[i]})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)

UNWIND RANGE(0,SIZE(nomInterpret)-1) as j
MERGE (i:Interpret {nomInterpret: nomInterpret[j]})
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)

;

mcentellesv · January 29, 2020, 9:32am

Thanks. I returns the following error:

Neo.ClientError.Statement.SyntaxError

WITH is required between CREATE and UNWIND (line 26, column 1 (offset: 998))
"UNWIND RANGE(0,SIZE(nomInterpret)-1) as j"

mcentellesv · January 29, 2020, 1:58pm

Finally, the Cypher query proposed by ameyasoft works, with just two chenges.

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaCom/pellicules.csv' AS row FIELDTERMINATOR ';'

WITH

row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,

row.VERSIO AS versioPellicula,

row.SINOPSI AS sinopsi,

row.ORIGINAL AS titolOriginal,

row.QUALIFICACIO AS qualificacio,

datetime(row.ANY) AS anyProduccio,

row.IDFILM AS pelliculaId,

row.CARTELL AS cartell,

row.ESTRENA AS dataEstrena,

row.TRAILER AS trailer,

row.TITOL AS titolCatala,

split(row.DIRECCIO, ",") AS nomDirector,

split(row.INTERPRETS, ",") AS nomInterpret

CREATE (p:Pellicula {pelliculaId: pelliculaId})

SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala

WITH p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomDirector)-1) AS i

MERGE (d:Director {nomDirector: nomDirector[i]})

CREATE (p)-[:ES_DIRIGIDA_PER]->(d)

WITH p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomInterpret)-1) AS j

MERGE (i:Interpret {nomInterpret: nomInterpret[j]})

CREATE (p)-[:ES_INTERPRETADA_PER]->(i)

;

The changes are:

I've repeated the clause WITH p, nomDirector, nomInterpret before the second UNWIND.
There were a typing mistake in clause MERGE (d:Director {nomDirector: nomDirector[i]}): normDirector in place of nomDirector

Thank you very much to you and Pilar Sáez!

Miquel Centelles

Topic		Replies	Views
How to Split a Row While importing CSV with Cypher Neo4j Graph Platform migrated	27	305	August 10, 2022
Trouble with SPLIT and UNWIND syntax Cypher	5	1177	October 29, 2020
"How to Split a Row While importing CSV with Cypher" was flagged as spam Cypher	1	172	March 13, 2023
Split multi value csv colomn into nodes and create relationship of those values Cypher cypher	9	73	August 19, 2024
How do I split the data in the column while loading in neo4j using neo4j-admin import? parsing issue Import / Export import , neo4j-import	2	1093	December 30, 2020

July Summer Fun!

Splitting multi-valued cells when importing CSV files

Related topics