Multiple Relationships in one csv with arrays in the relationship column

Beamfreak · May 15, 2023, 9:55am

Hey there,

first of this is my first Graph Database Project and I haven't found any questions regarding this specific case.

I have created an csv that is modelled after my Ontology that looks like this:
ID | class | Name | ... | has_Algorithm | has_UseCase | has_Dataset | ...
ID2 | Paper | xy | ... | ID5, ID6 | ID10, ID11 | ID7, ID8, ID9 | ...
ID5 | Algorithm | yz | ... | - | ID10 | - | ...
...

Now I want to create these Relationships in my Graph Database. The Solutions I found are always about merging relationships between different csv's with one extra relationship.csv.
About the arrays in the relationships I haven't found any solution.

Is there a way I can do that, or do I have to create multiple csv's for each relationship and one line for each relationship-ID?

Appreciate any help

maxyloon · May 16, 2023, 2:12pm

Hello Beamfreak! Welcome to the world of Neo4j

You can use a string function split to make you comma-separated string value into an array.

For example:

LOAD CSV WITH HEADERS FROM 'file:///path_to_your_csv.csv' AS row

CREATE (n:Class {id: row.ID, name: row.Name})

WITH n, row, split(row.has_Algorithm, ',') AS algorithmIDs
FOREACH (id IN algorithmIDs |
  MERGE (a:Algorithm {id: id})
  MERGE (n)-[:HAS_ALGORITHM]->(a)
)

dan.flavin1 · May 16, 2023, 3:50pm

Max's example is a straightforward Occam's Razor solution based on the assumption that you'd want each class to be a node label, and they are all known beforehand.

I am unsure if this applies to you, but there are ways to dynamically create labels if class label types are unknown. I took an example I already had and attempted to fill out the example data to meet your scenario as best I could. The example has other loading tidbits, such as apoc's, :param's, and CALL {} IN TRANSACTIONS to help manage transaction memory usage.

// Using input data to create node labels and relationships dynamically
// Example using Neo4j 5.x 

// Example data for file data.txt
// ID|class|Name|has_Algorithm|has_UseCase|has_Dataset
// ID2|Paper|xy|ID5, ID6|ID10, ID11|ID7, ID8, ID9
// ID5|Algorithm|yz|-|ID10|-
// ID6|Algorithm|abc|-|ID11|-
// ID7|Dataset|dataset7|-|ID10|-
// ID8|Dataset|dataset8|-|ID10|-
// ID9|Dataset|dataset9|-|ID11|-
// ID10|Usecase|usecase10|-|-|-
// ID11|Usecase|usecase11|-|-|-


// General Usage Notes
// NOTE: NEED to use the :auto directive in front of  LOAD CSV commands
// IF running the code in the Neo4j browser. Exception is the create constraint
// apoc command, but it doesn't hurt to have it. 
// REMOVE the :auto directive if running in cypher-shell
//
// Good background reading for understanding how nodes are relationships are 
// created without duplicates:
//   https://neo4j.com/docs/cypher-manual/current/clauses/merge/
//   https://neo4j.com/developer/kb/understanding-how-merge-works/
//
// A semi-colon terminates a Cypher stream/transactions which break variable scope. 
// Unlike SQL, independent Cypher queries can reference previous query values. 
// The WITH statement is like a pipe in linux/windows to // send variables/data from 
// one query to the next when needed. For example. on a write operation that will have 
// follow-on Cypher processing. 
//
// apoc's (https://neo4j.com/docs/apoc/current/) are used to dynamically create labels and relationships from 
// the input data.

// Wipe out database and start new.  
// WARNING: this will delete everything, data, indexes, constraints, and saved Bloom perspectives and scenes.
CREATE OR REPLACE DATABASE neo4j WAIT
;
:use neo4j
;

// strung together :param's work in cypher-shell
// :param dataDirectory => replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20")
// ;
// :param dataFile => "data.txt"
// ;
// :param dataLoadPath => 'file://' + $dataDirectory + $dataFile
// ;

// one :param for use when running in the Neo4j Browser
:param dataLoadPath => 'file://' + replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20") + 'data.txt'
;
// CREATE CONSTRAINTs
// Mitigates all node scans for fast lookups when creating relationships and enforces uniqueness. 
// Use an index if uniqueness is not valid
// Dynamically create unique constraints from input data using apoc.cypher.runSchema()
LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH DISTINCT row.class AS class
CALL apoc.cypher.runSchema('CREATE CONSTRAINT dynamicCreate_' + class + ' IF NOT EXISTS FOR (node:' + class + ') REQUIRE node.ID IS UNIQUE' ,{}) YIELD value AS v
RETURN v
;

// Can hard code constraints if you know the constraints ahead of time, or want to be explicit 
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Paper) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Algorithm) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Dataset) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Usecase) REQUIRE node.ID IS UNIQUE;

// CREATE class nodes first
// Labels in CREATE statements need to be hard coded. We can use Using apoc.create.node() to create the node 
// label from the input data.  This makes loading easier and dynamic
// 
// Using CALL {} IN TRANSACTIONS... helps manage memory usage on large loads.
//  https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/

:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
CALL {
	WITH row
  CALL apoc.create.node([row.class], 
	       {name: row.Name, ID: row.ID}
	    ) YIELD node AS node
  RETURN 1 AS nodeCount
} IN TRANSACTIONS OF 1000 ROWS
RETURN count(nodeCount) AS `# class nodes created`
;

// Create relationships from referenced ID's in a list.
// e.g. input has_Algorithm input of "ID5, ID6" will first be converted to a 
// list using the split() function. 
//    split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs
//
// Each of these IDs in the resulting list will be processed using the UNWIND statement. 
//    UNWIND algoIDs AS algoID
//
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH row AS row
WHERE row.class = 'Paper' 
CALL { 
	WITH row AS row
  WITH row.ID AS rowID,
     split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs, // makes a list [ ID5, ID6 ]
     split(replace(row.has_UseCase,   ' ', ''), ',') AS usecaseIDs,
     split(replace(row.has_Dataset,   ' ', ''), ',') AS datasetIDs
  MATCH (paper:Paper) WHERE paper.ID = rowID // get paper that the algo's, etc. are tied to
  WITH paper, algoIDs, usecaseIDs, datasetIDs
     UNWIND algoIDs AS algoID // walk through each ID and create relationships
       MATCH (algo:Algorithm) WHERE algo.ID = trim(algoID)
       MERGE (paper)-[:HAS_ALGORITHM]->(algo)
  WITH paper, usecaseIDs, datasetIDs
     UNWIND usecaseIDs AS usecaseID 
       MATCH (usecase:Usecase) WHERE usecase.ID = trim(usecaseID)
       MERGE (paper)-[:HAS_USECASE]->(usecase)
  WITH paper, datasetIDs
     UNWIND datasetIDs AS datasetID  
       MATCH (dataset:Dataset) WHERE dataset.ID = trim(datasetID)
       MERGE (paper)-[:HAS_DATASET]->(dataset)
} IN TRANSACTIONS OF 1000 ROWS

Beamfreak · May 18, 2023, 8:58am

First of all thanks to the detailed Explanation of your solution.

Though when I run it, the database only creates the has_Algorithm-Relationship and not the other ones.
I verified the column names and the IDs are correctly in the database.
Do you have any idea why that is the case?

dan.flavin1 · May 19, 2023, 1:54pm

Much appreciate getting your real data. I do know why the example did not load. The example data I created has a different format from the data you're loading. The adjusted code is below. A couple of things to note:

Your data has commas in the data itself and as a field delimiter for the has_* fields, e.g., "ID12, ID13, ID14". That will cause problems with LOAD CSV by treating each comma delimited as a separate data item to load. To address this, I changed the input file to use "|" as the delimiter for LOAD CSV, set by using the FIELDTERMINATOR '|' option.
A quick way to get up to speed with using Cypher, including LOAD CSV is to look at the free, self-paced, hands-on online training available through the Neo4j graphacadamy. The modules are short, and include one dedicated to loading data.

Good luck and happy graphing!

// Using input data to create node labels and relationships dynamically
// Example using Neo4j 5.x 

// Example data for file newDataPipeDelimited.txt
// ID|upperClass|class|title|has_Algorithm|has_Dataset|has_EvaluationMetric
// ID1|Paper|Paper|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID2|Paper|Experiment|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|ID3
// ID3|Paper|EvaluationMetric|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID4|Algorithm|AnomalyDetection|NAC|||
// ID5|Algorithm|SupervisedLearning|?|||
// ID6|Algorithm|ReinforcementLearning|NAC|||
// ID7|Algorithm|NeuralNetworks|NAC|||
// ID8|Algorithm|Algorithm|NAC|||
// ID9|Algorithm|Algorithm|MOD|||
// ID10|Algorithm|Algorithm|NOL|||
// ID11|Algorithm|Algorithm|D3TS|||
// ID12|Dataset|GraphDataset|Selective network discovery via deep reinforcement learning on embedded spaces|||
// ID13|Dataset|GraphDataset|GEMSEC: Graph Embedding with Self Clustering|||
// ID14|Dataset|GraphDataset|Selective harvesting over networks|||
// ID15|UseCase|Regression||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID16|UseCase|Graphs||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID17|Paper|Paper|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID18|Paper|Experiment|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|ID19
// ID19|Paper|EvaluationMetric|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID20|Algorithm|AnomalyDetection|"DONE, AdONE"|||
// ID21|Algorithm|UnsupervisedLearning|"DONE, AdONE"|||
// ID22|Algorithm|AutoEncoder|"DONE, AdONE"|||
// ID23|Algorithm|Algorithm|DONE|||
// ID24|Algorithm|Algorithm|AdONE|||
// ID25|Algorithm|Algorithm|"Node2vec, LINE, SDNE, GraphSage, DGI, SEANO, ONE, DOMINANT"|||
// ID26|Dataset|GraphDataset|"WebKB, Cora, Citeseer, Pubmed"|||
// ID27|UseCase|Regression||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID28|UseCase|Graphs||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|


// General Usage Notes
// NOTE: NEED to use the :auto directive in front of  LOAD CSV commands
// IF running the code in the Neo4j browser. Exception is the create constraint
// apoc command, but it doesn't hurt to have it. 
// REMOVE the :auto directive if running in cypher-shell
//
// Good background reading for understanding how nodes are relationships are 
// created without duplicates:
//   https://neo4j.com/docs/cypher-manual/current/clauses/merge/
//   https://neo4j.com/developer/kb/understanding-how-merge-works/
//
// A semi-colon terminates a Cypher stream/transactions which break variable scope. 
// Unlike SQL, independent Cypher queries can reference previous query values. 
// The WITH statement is like a pipe in linux/windows to // send variables/data from 
// one query to the next when needed. For example. on a write operation that will have 
// follow-on Cypher processing. 
//
// apoc's (https://neo4j.com/docs/apoc/current/) are used to dynamically create labels and relationships from 
// the input data.

// Wipe out database and start new.  
// WARNING: this will delete everything, data, indexes, constraints, and saved Bloom perspectives and scenes.
CREATE OR REPLACE DATABASE neo4j WAIT
;
:use neo4j
;

// strung together :param's work in cypher-shell
// :param dataDirectory => replace("/Users/data/Loading Dynamic Labels And Relationships/", " ", "%20")
// ;
// :param dataFile => "data.txt"
// ;
// :param dataLoadPath => 'file://' + $dataDirectory + $dataFile
// ;

// one :param for use when running in the Neo4j Browser
:param dataLoadPath => 'file://' + replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20") + 'newDataPipeDelimited.txt'
;

// CREATE CONSTRAINTs
// Mitigates all node scans for fast lookups when creating relationships and enforces uniqueness. 
// Use an index if uniqueness is not valid
// Dynamically create unique constraints from input data using apoc.cypher.runSchema()
LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH DISTINCT row.upperClass AS upperClass
CALL apoc.cypher.runSchema('CREATE CONSTRAINT dynamicCreate_' + upperClass + ' IF NOT EXISTS FOR (node:' + upperClass + ') REQUIRE node.ID IS UNIQUE' ,{}) YIELD value AS v
RETURN v
;

// Can hard code constraints if you know the constraints ahead of time, or want to be explicit 
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Paper) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Algorithm) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Dataset) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Usecase) REQUIRE node.ID IS UNIQUE;

// CREATE class nodes first
// Labels in CREATE statements need to be hard coded. We can use Using apoc.create.node() to create the node 
// label from the input data.  This makes loading easier and more dynamic
// 
// Using CALL {} IN TRANSACTIONS... helps manage memory usage on large loads.
//  https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/

// Use :auto directive when running the code in the Neo4j 
// Browser. Remove :auto for executing with cypher-shell
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
CALL {
	WITH row
  CALL apoc.create.node([row.upperClass], 
	       {name: row.Name, ID: row.ID, class: row.class}
	    ) YIELD node AS node
  RETURN 1 AS nodeCount
} IN TRANSACTIONS OF 1000 ROWS
RETURN count(nodeCount) AS `# class nodes created`
;

// Create relationships from referenced ID's in a list.
// e.g. input has_Algorithm input of "ID5, ID6" will first be converted to a 
// list using the split() function. 
//    split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs
//
// Each of these IDs in the resulting list will be processed using the UNWIND statement. 
//    UNWIND algoIDs AS algoID
//
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH row AS row
CALL { 
  WITH row AS row
   // has_Algorithm|has_Dataset|has_EvaluationMetric
  WITH row.ID AS rowID, 
     split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs, // makes a list [ ID5, ID6 ]
     split(replace(row.has_Dataset,   ' ', ''), ',') AS datasetIDs,
     split(replace(row.has_EvaluationMetric,   ' ', ''), ',') AS evalMetricIDs
  
  // THIS CHANGED!! 
  // MATCH (paper:Paper) WHERE paper.ID = rowID // get paper that the algo's, etc. are tied to
  // Now are creating anchor node to attach whatever HAS_* relationship to
  MATCH (node1) WHERE node1.ID = rowID 

  WITH node1, algoIDs, datasetIDs, evalMetricIDs
    UNWIND algoIDs AS algoID // walk through each ID and create relationships
       MATCH (node2) WHERE node2.ID = algoID
       MERGE (node1)-[:HAS_ALGORITHM]->(node2)
  WITH node1, datasetIDs, evalMetricIDs
     UNWIND datasetIDs AS datasetID 
       MATCH (node2) WHERE node2.ID = datasetID
       MERGE (node1)-[:HAS_DATASET]->(node2)
  WITH node1, evalMetricIDs
     UNWIND evalMetricIDs AS evalMetricID  
       MATCH (node2) WHERE node2.ID = evalMetricID
       MERGE (node1)-[:HAS_EVAL_METRIC]->(node2)
} IN TRANSACTIONS OF 1000 ROWS

Beamfreak · May 19, 2023, 3:53pm

Thank you very much.

The Graph is now properly connected.
I did already have a look at that, but I will give it another more detailed look.

Topic		Replies	Views
Relationship between nodes of same Label General	28	6947	July 18, 2021
Creating relationships between nodes of the same kind (LOAD CSV) Cypher cypher , load-csv	15	4551	July 18, 2020
How to load csv in matrix form and create relationship Neo4j Graph Platform	10	652	March 3, 2020
Can we Create a relationship by matching ID from csv file Cypher cypher , operations , import , knowledge-base , neo4j-desktop	12	1953	January 5, 2021
Neo4j: Relationships Neo4j Graph Platform migrated	32	364	December 28, 2022

July Summer Fun!

Multiple Relationships in one csv with arrays in the relationship column

Related topics