Splitting multi-valued cells when importing CSV files

Hi,

I want to create a database from a CSV file using LOAD CSV. A column called IDFILM conatins a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p:Pellicula label. Two other columns -DIRECCIO and INTERPRETS- may have a unique value or multiple values in each of their cells (and, sometimes, are empty). So, I want to split cells containing multiple values in order to generate separated nodes. Nodes from the column DIRECCIO will belong to the label d:Director, and nodes from the INTERPRETS column will belong to the label i:Interpret.

And then, I want to relate these isolated nodes from the columns DIRECCIO and INTERPRETS with the nodes from the IDFILM column through two different relationships:

(p:Pellicula)-[:ES_DIRIGIDA_PER]->(d:Director)

(p:Pellicula)-[:ES_INTERPRETADA_PER]->(i)

I've prepared the following query to load and transform CSV data into the Neo4j database, in the way I have explained.

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala, SPLIT(row.DIRECCIO, ",") AS nomDirector, SPLIT(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
MERGE (d:Director {nomDirector: nomDirector})
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;

The point is that orders like "SPLIT(row.DIRECCIO, ",") AS nomDirector" just transform multivalued cells into lists of values, but not into separated nodes. In addition, I have anothe doubt. In some cells, values can be separated by two different separators. For exemple: "a, b, c and d" where delimiters are "," and "and".

Thank you in advance.

Miquel Centelles

Hi Miquel,

one suggestion would be using something like this:
UNWIND nomDirectorList AS nomDirector
before MERGE

Thank you very much, Pilar.

I've been working around your suggestion of using the clause UNWIND to expand the list of values into a sequence of rows. In Importing CSV Data into Neo4j https://neo4j.com/developer/guide-import-csv/#import-load-csv, there is an example of Cypher query to do it:

//split string of employee skills into separate nodes
LOAD CSV FROM 'file:///data.csv' AS row
MERGE (e:Employee {employeeId: row.Id})
UNWIND split(row.skills, ',') AS skill
MERGE (s:Skill {name: skill})
MERGE (e)-[r:HAS_EXPERIENCE]->(s);

The point is that I don't find out how to insert this clause in my query. It always returns errors like "WITH is required between SET and UNWIND".

Miquel Centelles

Would you try runing this sentence:

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector
MERGE (d:Director {nomDirector: nomDirector})
UNWIND SPLIT(row.INTERPRETS, ",") AS nomInterpret
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;

Many thanks. It returns the error message:

Error. Neo.ClientError.Statement.SyntaxError

WITH is required between SET and UNWIND (line 5, column 1 (offset: 749))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"

May you can suppress SET statment and put all the attributes inside the CREATE clause?

It keeps on error:

WITH is required between CREATE and UNWIND (line 4, column 1 (offset: 156))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"
^

Try this:

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH
row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,
row.VERSIO AS versioPellicula,
row.SINOPSI AS sinopsi,
row.ORIGINAL AS titolOriginal,
row.QUALIFICACIO AS qualificacio,
datetime(row.ANY) AS anyProduccio,
row.IDFILM AS pelliculaId,
row.CARTELL AS cartell,
row.ESTRENA AS dataEstrena,
row.TRAILER AS trailer,
row.TITOL AS titolCatala,
SPLIT(row.DIRECCIO, ",") AS nomDirector,
SPLIT(row.INTERPRETS, ",") AS nomInterpret

CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala

with p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomDirector)-1) as i
MERGE (d:Director {normDirector: nomDirector[i]})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)

UNWIND RANGE(0,SIZE(nomInterpret)-1) as j
MERGE (i:Interpret {nomInterpret: nomInterpret[j]})
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)

;

Thanks. I returns the following error:

Neo.ClientError.Statement.SyntaxError

WITH is required between CREATE and UNWIND (line 26, column 1 (offset: 998))
"UNWIND RANGE(0,SIZE(nomInterpret)-1) as j"

Finally, the Cypher query proposed by ameyasoft works, with just two chenges.

LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaCom/pellicules.csv' AS row FIELDTERMINATOR ';'

WITH

row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,

row.VERSIO AS versioPellicula,

row.SINOPSI AS sinopsi,

row.ORIGINAL AS titolOriginal,

row.QUALIFICACIO AS qualificacio,

datetime(row.ANY) AS anyProduccio,

row.IDFILM AS pelliculaId,

row.CARTELL AS cartell,

row.ESTRENA AS dataEstrena,

row.TRAILER AS trailer,

row.TITOL AS titolCatala,

split(row.DIRECCIO, ",") AS nomDirector,

split(row.INTERPRETS, ",") AS nomInterpret

CREATE (p:Pellicula {pelliculaId: pelliculaId})

SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala

WITH p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomDirector)-1) AS i

MERGE (d:Director {nomDirector: nomDirector[i]})

CREATE (p)-[:ES_DIRIGIDA_PER]->(d)

WITH p, nomDirector, nomInterpret

UNWIND RANGE(0,SIZE(nomInterpret)-1) AS j

MERGE (i:Interpret {nomInterpret: nomInterpret[j]})

CREATE (p)-[:ES_INTERPRETADA_PER]->(i)

;

The changes are:

  • I've repeated the clause WITH p, nomDirector, nomInterpret before the second UNWIND.
  • There were a typing mistake in clause MERGE (d:Director {nomDirector: nomDirector[i]}): normDirector in place of nomDirector

Thank you very much to you and Pilar Sáez!

Miquel Centelles

1 Like