Hi,
I want to create a database from a CSV file using LOAD CSV. A column called IDFILM conatins a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p:Pellicula label. Two other columns -DIRECCIO and INTERPRETS- may have a unique value or multiple values in each of their cells (and, sometimes, are empty). So, I want to split cells containing multiple values in order to generate separated nodes. Nodes from the column DIRECCIO will belong to the label d:Director, and nodes from the INTERPRETS column will belong to the label i:Interpret.
And then, I want to relate these isolated nodes from the columns DIRECCIO and INTERPRETS with the nodes from the IDFILM column through two different relationships:
(p:Pellicula)-[:ES_DIRIGIDA_PER]->(d:Director)
(p:Pellicula)-[:ES_INTERPRETADA_PER]->(i)
I've prepared the following query to load and transform CSV data into the Neo4j database, in the way I have explained.
LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala, SPLIT(row.DIRECCIO, ",") AS nomDirector, SPLIT(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
MERGE (d:Director {nomDirector: nomDirector})
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;
The point is that orders like "SPLIT(row.DIRECCIO, ",") AS nomDirector" just transform multivalued cells into lists of values, but not into separated nodes. In addition, I have anothe doubt. In some cells, values can be separated by two different separators. For exemple: "a, b, c and d" where delimiters are "," and "and".
Thank you in advance.
Miquel Centelles