Multiple Relationships in one csv with arrays in the relationship column

Hey there,

first of this is my first Graph Database Project and I haven't found any questions regarding this specific case.

I have created an csv that is modelled after my Ontology that looks like this:
ID | class | Name | ... | has_Algorithm | has_UseCase | has_Dataset | ...
ID2 | Paper | xy | ... | ID5, ID6 | ID10, ID11 | ID7, ID8, ID9 | ...
ID5 | Algorithm | yz | ... | - | ID10 | - | ...
...

Now I want to create these Relationships in my Graph Database. The Solutions I found are always about merging relationships between different csv's with one extra relationship.csv.
About the arrays in the relationships I haven't found any solution.

Is there a way I can do that, or do I have to create multiple csv's for each relationship and one line for each relationship-ID?

Appreciate any help

Hello Beamfreak! Welcome to the world of Neo4j :smiley:

You can use a string function split to make you comma-separated string value into an array.

For example:

LOAD CSV WITH HEADERS FROM 'file:///path_to_your_csv.csv' AS row

CREATE (n:Class {id: row.ID, name: row.Name})

WITH n, row, split(row.has_Algorithm, ',') AS algorithmIDs
FOREACH (id IN algorithmIDs |
  MERGE (a:Algorithm {id: id})
  MERGE (n)-[:HAS_ALGORITHM]->(a)
)

2 Likes

Max's example is a straightforward Occam's Razor solution based on the assumption that you'd want each class to be a node label, and they are all known beforehand.

I am unsure if this applies to you, but there are ways to dynamically create labels if class label types are unknown. I took an example I already had and attempted to fill out the example data to meet your scenario as best I could. The example has other loading tidbits, such as apoc's, :param's, and CALL {} IN TRANSACTIONS to help manage transaction memory usage.

// Using input data to create node labels and relationships dynamically
// Example using Neo4j 5.x 

// Example data for file data.txt
// ID|class|Name|has_Algorithm|has_UseCase|has_Dataset
// ID2|Paper|xy|ID5, ID6|ID10, ID11|ID7, ID8, ID9
// ID5|Algorithm|yz|-|ID10|-
// ID6|Algorithm|abc|-|ID11|-
// ID7|Dataset|dataset7|-|ID10|-
// ID8|Dataset|dataset8|-|ID10|-
// ID9|Dataset|dataset9|-|ID11|-
// ID10|Usecase|usecase10|-|-|-
// ID11|Usecase|usecase11|-|-|-


// General Usage Notes
// NOTE: NEED to use the :auto directive in front of  LOAD CSV commands
// IF running the code in the Neo4j browser. Exception is the create constraint
// apoc command, but it doesn't hurt to have it. 
// REMOVE the :auto directive if running in cypher-shell
//
// Good background reading for understanding how nodes are relationships are 
// created without duplicates:
//   https://neo4j.com/docs/cypher-manual/current/clauses/merge/
//   https://neo4j.com/developer/kb/understanding-how-merge-works/
//
// A semi-colon terminates a Cypher stream/transactions which break variable scope. 
// Unlike SQL, independent Cypher queries can reference previous query values. 
// The WITH statement is like a pipe in linux/windows to // send variables/data from 
// one query to the next when needed. For example. on a write operation that will have 
// follow-on Cypher processing. 
//
// apoc's (https://neo4j.com/docs/apoc/current/) are used to dynamically create labels and relationships from 
// the input data.

// Wipe out database and start new.  
// WARNING: this will delete everything, data, indexes, constraints, and saved Bloom perspectives and scenes.
CREATE OR REPLACE DATABASE neo4j WAIT
;
:use neo4j
;

// strung together :param's work in cypher-shell
// :param dataDirectory => replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20")
// ;
// :param dataFile => "data.txt"
// ;
// :param dataLoadPath => 'file://' + $dataDirectory + $dataFile
// ;

// one :param for use when running in the Neo4j Browser
:param dataLoadPath => 'file://' + replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20") + 'data.txt'
;
// CREATE CONSTRAINTs
// Mitigates all node scans for fast lookups when creating relationships and enforces uniqueness. 
// Use an index if uniqueness is not valid
// Dynamically create unique constraints from input data using apoc.cypher.runSchema()
LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH DISTINCT row.class AS class
CALL apoc.cypher.runSchema('CREATE CONSTRAINT dynamicCreate_' + class + ' IF NOT EXISTS FOR (node:' + class + ') REQUIRE node.ID IS UNIQUE' ,{}) YIELD value AS v
RETURN v
;

// Can hard code constraints if you know the constraints ahead of time, or want to be explicit 
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Paper) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Algorithm) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Dataset) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Usecase) REQUIRE node.ID IS UNIQUE;

// CREATE class nodes first
// Labels in CREATE statements need to be hard coded. We can use Using apoc.create.node() to create the node 
// label from the input data.  This makes loading easier and dynamic
// 
// Using CALL {} IN TRANSACTIONS... helps manage memory usage on large loads.
//  https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/

:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
CALL {
	WITH row
  CALL apoc.create.node([row.class], 
	       {name: row.Name, ID: row.ID}
	    ) YIELD node AS node
  RETURN 1 AS nodeCount
} IN TRANSACTIONS OF 1000 ROWS
RETURN count(nodeCount) AS `# class nodes created`
;

// Create relationships from referenced ID's in a list.
// e.g. input has_Algorithm input of "ID5, ID6" will first be converted to a 
// list using the split() function. 
//    split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs
//
// Each of these IDs in the resulting list will be processed using the UNWIND statement. 
//    UNWIND algoIDs AS algoID
//
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH row AS row
WHERE row.class = 'Paper' 
CALL { 
	WITH row AS row
  WITH row.ID AS rowID,
     split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs, // makes a list [ ID5, ID6 ]
     split(replace(row.has_UseCase,   ' ', ''), ',') AS usecaseIDs,
     split(replace(row.has_Dataset,   ' ', ''), ',') AS datasetIDs
  MATCH (paper:Paper) WHERE paper.ID = rowID // get paper that the algo's, etc. are tied to
  WITH paper, algoIDs, usecaseIDs, datasetIDs
     UNWIND algoIDs AS algoID // walk through each ID and create relationships
       MATCH (algo:Algorithm) WHERE algo.ID = trim(algoID)
       MERGE (paper)-[:HAS_ALGORITHM]->(algo)
  WITH paper, usecaseIDs, datasetIDs
     UNWIND usecaseIDs AS usecaseID 
       MATCH (usecase:Usecase) WHERE usecase.ID = trim(usecaseID)
       MERGE (paper)-[:HAS_USECASE]->(usecase)
  WITH paper, datasetIDs
     UNWIND datasetIDs AS datasetID  
       MATCH (dataset:Dataset) WHERE dataset.ID = trim(datasetID)
       MERGE (paper)-[:HAS_DATASET]->(dataset)
} IN TRANSACTIONS OF 1000 ROWS
1 Like

First of all thanks to the detailed Explanation of your solution.

Though when I run it, the database only creates the has_Algorithm-Relationship and not the other ones.
I verified the column names and the IDs are correctly in the database.
Do you have any idea why that is the case?

Much appreciate getting your real data. I do know why the example did not load. The example data I created has a different format from the data you're loading. The adjusted code is below. A couple of things to note:

  1. Your data has commas in the data itself and as a field delimiter for the has_* fields, e.g., "ID12, ID13, ID14". That will cause problems with LOAD CSV by treating each comma delimited as a separate data item to load. To address this, I changed the input file to use "|" as the delimiter for LOAD CSV, set by using the FIELDTERMINATOR '|' option.
  2. A quick way to get up to speed with using Cypher, including LOAD CSV is to look at the free, self-paced, hands-on online training available through the Neo4j graphacadamy. The modules are short, and include one dedicated to loading data.

Good luck and happy graphing!

// Using input data to create node labels and relationships dynamically
// Example using Neo4j 5.x 

// Example data for file newDataPipeDelimited.txt
// ID|upperClass|class|title|has_Algorithm|has_Dataset|has_EvaluationMetric
// ID1|Paper|Paper|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID2|Paper|Experiment|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|ID3
// ID3|Paper|EvaluationMetric|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID4|Algorithm|AnomalyDetection|NAC|||
// ID5|Algorithm|SupervisedLearning|?|||
// ID6|Algorithm|ReinforcementLearning|NAC|||
// ID7|Algorithm|NeuralNetworks|NAC|||
// ID8|Algorithm|Algorithm|NAC|||
// ID9|Algorithm|Algorithm|MOD|||
// ID10|Algorithm|Algorithm|NOL|||
// ID11|Algorithm|Algorithm|D3TS|||
// ID12|Dataset|GraphDataset|Selective network discovery via deep reinforcement learning on embedded spaces|||
// ID13|Dataset|GraphDataset|GEMSEC: Graph Embedding with Self Clustering|||
// ID14|Dataset|GraphDataset|Selective harvesting over networks|||
// ID15|UseCase|Regression||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID16|UseCase|Graphs||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID17|Paper|Paper|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID18|Paper|Experiment|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|ID19
// ID19|Paper|EvaluationMetric|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID20|Algorithm|AnomalyDetection|"DONE, AdONE"|||
// ID21|Algorithm|UnsupervisedLearning|"DONE, AdONE"|||
// ID22|Algorithm|AutoEncoder|"DONE, AdONE"|||
// ID23|Algorithm|Algorithm|DONE|||
// ID24|Algorithm|Algorithm|AdONE|||
// ID25|Algorithm|Algorithm|"Node2vec, LINE, SDNE, GraphSage, DGI, SEANO, ONE, DOMINANT"|||
// ID26|Dataset|GraphDataset|"WebKB, Cora, Citeseer, Pubmed"|||
// ID27|UseCase|Regression||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID28|UseCase|Graphs||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|


// General Usage Notes
// NOTE: NEED to use the :auto directive in front of  LOAD CSV commands
// IF running the code in the Neo4j browser. Exception is the create constraint
// apoc command, but it doesn't hurt to have it. 
// REMOVE the :auto directive if running in cypher-shell
//
// Good background reading for understanding how nodes are relationships are 
// created without duplicates:
//   https://neo4j.com/docs/cypher-manual/current/clauses/merge/
//   https://neo4j.com/developer/kb/understanding-how-merge-works/
//
// A semi-colon terminates a Cypher stream/transactions which break variable scope. 
// Unlike SQL, independent Cypher queries can reference previous query values. 
// The WITH statement is like a pipe in linux/windows to // send variables/data from 
// one query to the next when needed. For example. on a write operation that will have 
// follow-on Cypher processing. 
//
// apoc's (https://neo4j.com/docs/apoc/current/) are used to dynamically create labels and relationships from 
// the input data.

// Wipe out database and start new.  
// WARNING: this will delete everything, data, indexes, constraints, and saved Bloom perspectives and scenes.
CREATE OR REPLACE DATABASE neo4j WAIT
;
:use neo4j
;

// strung together :param's work in cypher-shell
// :param dataDirectory => replace("/Users/data/Loading Dynamic Labels And Relationships/", " ", "%20")
// ;
// :param dataFile => "data.txt"
// ;
// :param dataLoadPath => 'file://' + $dataDirectory + $dataFile
// ;

// one :param for use when running in the Neo4j Browser
:param dataLoadPath => 'file://' + replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20") + 'newDataPipeDelimited.txt'
;

// CREATE CONSTRAINTs
// Mitigates all node scans for fast lookups when creating relationships and enforces uniqueness. 
// Use an index if uniqueness is not valid
// Dynamically create unique constraints from input data using apoc.cypher.runSchema()
LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH DISTINCT row.upperClass AS upperClass
CALL apoc.cypher.runSchema('CREATE CONSTRAINT dynamicCreate_' + upperClass + ' IF NOT EXISTS FOR (node:' + upperClass + ') REQUIRE node.ID IS UNIQUE' ,{}) YIELD value AS v
RETURN v
;

// Can hard code constraints if you know the constraints ahead of time, or want to be explicit 
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Paper) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Algorithm) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Dataset) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Usecase) REQUIRE node.ID IS UNIQUE;

// CREATE class nodes first
// Labels in CREATE statements need to be hard coded. We can use Using apoc.create.node() to create the node 
// label from the input data.  This makes loading easier and more dynamic
// 
// Using CALL {} IN TRANSACTIONS... helps manage memory usage on large loads.
//  https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/

// Use :auto directive when running the code in the Neo4j 
// Browser. Remove :auto for executing with cypher-shell
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
CALL {
	WITH row
  CALL apoc.create.node([row.upperClass], 
	       {name: row.Name, ID: row.ID, class: row.class}
	    ) YIELD node AS node
  RETURN 1 AS nodeCount
} IN TRANSACTIONS OF 1000 ROWS
RETURN count(nodeCount) AS `# class nodes created`
;

// Create relationships from referenced ID's in a list.
// e.g. input has_Algorithm input of "ID5, ID6" will first be converted to a 
// list using the split() function. 
//    split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs
//
// Each of these IDs in the resulting list will be processed using the UNWIND statement. 
//    UNWIND algoIDs AS algoID
//
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH row AS row
CALL { 
  WITH row AS row
   // has_Algorithm|has_Dataset|has_EvaluationMetric
  WITH row.ID AS rowID, 
     split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs, // makes a list [ ID5, ID6 ]
     split(replace(row.has_Dataset,   ' ', ''), ',') AS datasetIDs,
     split(replace(row.has_EvaluationMetric,   ' ', ''), ',') AS evalMetricIDs
  
  // THIS CHANGED!! 
  // MATCH (paper:Paper) WHERE paper.ID = rowID // get paper that the algo's, etc. are tied to
  // Now are creating anchor node to attach whatever HAS_* relationship to
  MATCH (node1) WHERE node1.ID = rowID 

  WITH node1, algoIDs, datasetIDs, evalMetricIDs
    UNWIND algoIDs AS algoID // walk through each ID and create relationships
       MATCH (node2) WHERE node2.ID = algoID
       MERGE (node1)-[:HAS_ALGORITHM]->(node2)
  WITH node1, datasetIDs, evalMetricIDs
     UNWIND datasetIDs AS datasetID 
       MATCH (node2) WHERE node2.ID = datasetID
       MERGE (node1)-[:HAS_DATASET]->(node2)
  WITH node1, evalMetricIDs
     UNWIND evalMetricIDs AS evalMetricID  
       MATCH (node2) WHERE node2.ID = evalMetricID
       MERGE (node1)-[:HAS_EVAL_METRIC]->(node2)
} IN TRANSACTIONS OF 1000 ROWS
1 Like

Thank you very much.

The Graph is now properly connected.
I did already have a look at that, but I will give it another more detailed look.