Much appreciate getting your real data. I do know why the example did not load. The example data I created has a different format from the data you're loading. The adjusted code is below. A couple of things to note:
- Your data has commas in the data itself and as a field delimiter for the has_* fields, e.g.,
"ID12, ID13, ID14"
. That will cause problems with LOAD CSV
by treating each comma delimited as a separate data item to load. To address this, I changed the input file to use "|
" as the delimiter for LOAD CSV
, set by using the FIELDTERMINATOR '|'
option.
- A quick way to get up to speed with using Cypher, including
LOAD CSV
is to look at the free, self-paced, hands-on online training available through the Neo4j graphacadamy. The modules are short, and include one dedicated to loading data.
Good luck and happy graphing!
// Using input data to create node labels and relationships dynamically
// Example using Neo4j 5.x
// Example data for file newDataPipeDelimited.txt
// ID|upperClass|class|title|has_Algorithm|has_Dataset|has_EvaluationMetric
// ID1|Paper|Paper|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID2|Paper|Experiment|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|ID3
// ID3|Paper|EvaluationMetric|Selective network discovery via deep reinforcement learning on embedded spaces|"ID4, ID5, ID6, ID7, ID8, ID9, ID10, ID11"|"ID12, ID13, ID14"|
// ID4|Algorithm|AnomalyDetection|NAC|||
// ID5|Algorithm|SupervisedLearning|?|||
// ID6|Algorithm|ReinforcementLearning|NAC|||
// ID7|Algorithm|NeuralNetworks|NAC|||
// ID8|Algorithm|Algorithm|NAC|||
// ID9|Algorithm|Algorithm|MOD|||
// ID10|Algorithm|Algorithm|NOL|||
// ID11|Algorithm|Algorithm|D3TS|||
// ID12|Dataset|GraphDataset|Selective network discovery via deep reinforcement learning on embedded spaces|||
// ID13|Dataset|GraphDataset|GEMSEC: Graph Embedding with Self Clustering|||
// ID14|Dataset|GraphDataset|Selective harvesting over networks|||
// ID15|UseCase|Regression||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID16|UseCase|Graphs||"ID4, ID5, ID6, ID7, ID8"|"ID12, ID13, ID14"|
// ID17|Paper|Paper|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID18|Paper|Experiment|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|ID19
// ID19|Paper|EvaluationMetric|Outlier Resistant Unsupervised Deep Architecture for Attributed Network Embedding|"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID20|Algorithm|AnomalyDetection|"DONE, AdONE"|||
// ID21|Algorithm|UnsupervisedLearning|"DONE, AdONE"|||
// ID22|Algorithm|AutoEncoder|"DONE, AdONE"|||
// ID23|Algorithm|Algorithm|DONE|||
// ID24|Algorithm|Algorithm|AdONE|||
// ID25|Algorithm|Algorithm|"Node2vec, LINE, SDNE, GraphSage, DGI, SEANO, ONE, DOMINANT"|||
// ID26|Dataset|GraphDataset|"WebKB, Cora, Citeseer, Pubmed"|||
// ID27|UseCase|Regression||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// ID28|UseCase|Graphs||"ID20, ID21, ID22, ID23, ID24, ID25"|ID26|
// General Usage Notes
// NOTE: NEED to use the :auto directive in front of LOAD CSV commands
// IF running the code in the Neo4j browser. Exception is the create constraint
// apoc command, but it doesn't hurt to have it.
// REMOVE the :auto directive if running in cypher-shell
//
// Good background reading for understanding how nodes are relationships are
// created without duplicates:
// https://neo4j.com/docs/cypher-manual/current/clauses/merge/
// https://neo4j.com/developer/kb/understanding-how-merge-works/
//
// A semi-colon terminates a Cypher stream/transactions which break variable scope.
// Unlike SQL, independent Cypher queries can reference previous query values.
// The WITH statement is like a pipe in linux/windows to // send variables/data from
// one query to the next when needed. For example. on a write operation that will have
// follow-on Cypher processing.
//
// apoc's (https://neo4j.com/docs/apoc/current/) are used to dynamically create labels and relationships from
// the input data.
// Wipe out database and start new.
// WARNING: this will delete everything, data, indexes, constraints, and saved Bloom perspectives and scenes.
CREATE OR REPLACE DATABASE neo4j WAIT
;
:use neo4j
;
// strung together :param's work in cypher-shell
// :param dataDirectory => replace("/Users/data/Loading Dynamic Labels And Relationships/", " ", "%20")
// ;
// :param dataFile => "data.txt"
// ;
// :param dataLoadPath => 'file://' + $dataDirectory + $dataFile
// ;
// one :param for use when running in the Neo4j Browser
:param dataLoadPath => 'file://' + replace("/Users/danflavin/Documents/Neo4j Resources/Cypher Examples/Loading Dynamic Labels And Relationships/", " ", "%20") + 'newDataPipeDelimited.txt'
;
// CREATE CONSTRAINTs
// Mitigates all node scans for fast lookups when creating relationships and enforces uniqueness.
// Use an index if uniqueness is not valid
// Dynamically create unique constraints from input data using apoc.cypher.runSchema()
LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH DISTINCT row.upperClass AS upperClass
CALL apoc.cypher.runSchema('CREATE CONSTRAINT dynamicCreate_' + upperClass + ' IF NOT EXISTS FOR (node:' + upperClass + ') REQUIRE node.ID IS UNIQUE' ,{}) YIELD value AS v
RETURN v
;
// Can hard code constraints if you know the constraints ahead of time, or want to be explicit
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Paper) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Algorithm) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Dataset) REQUIRE node.ID IS UNIQUE;
// CREATE CONSTRAINT IF NOT EXISTS FOR (node:Usecase) REQUIRE node.ID IS UNIQUE;
// CREATE class nodes first
// Labels in CREATE statements need to be hard coded. We can use Using apoc.create.node() to create the node
// label from the input data. This makes loading easier and more dynamic
//
// Using CALL {} IN TRANSACTIONS... helps manage memory usage on large loads.
// https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/
// Use :auto directive when running the code in the Neo4j
// Browser. Remove :auto for executing with cypher-shell
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
CALL {
WITH row
CALL apoc.create.node([row.upperClass],
{name: row.Name, ID: row.ID, class: row.class}
) YIELD node AS node
RETURN 1 AS nodeCount
} IN TRANSACTIONS OF 1000 ROWS
RETURN count(nodeCount) AS `# class nodes created`
;
// Create relationships from referenced ID's in a list.
// e.g. input has_Algorithm input of "ID5, ID6" will first be converted to a
// list using the split() function.
// split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs
//
// Each of these IDs in the resulting list will be processed using the UNWIND statement.
// UNWIND algoIDs AS algoID
//
:auto LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
// LOAD CSV WITH HEADERS FROM $dataLoadPath AS row FIELDTERMINATOR '|'
WITH row AS row
CALL {
WITH row AS row
// has_Algorithm|has_Dataset|has_EvaluationMetric
WITH row.ID AS rowID,
split(replace(row.has_Algorithm, ' ', ''), ',') AS algoIDs, // makes a list [ ID5, ID6 ]
split(replace(row.has_Dataset, ' ', ''), ',') AS datasetIDs,
split(replace(row.has_EvaluationMetric, ' ', ''), ',') AS evalMetricIDs
// THIS CHANGED!!
// MATCH (paper:Paper) WHERE paper.ID = rowID // get paper that the algo's, etc. are tied to
// Now are creating anchor node to attach whatever HAS_* relationship to
MATCH (node1) WHERE node1.ID = rowID
WITH node1, algoIDs, datasetIDs, evalMetricIDs
UNWIND algoIDs AS algoID // walk through each ID and create relationships
MATCH (node2) WHERE node2.ID = algoID
MERGE (node1)-[:HAS_ALGORITHM]->(node2)
WITH node1, datasetIDs, evalMetricIDs
UNWIND datasetIDs AS datasetID
MATCH (node2) WHERE node2.ID = datasetID
MERGE (node1)-[:HAS_DATASET]->(node2)
WITH node1, evalMetricIDs
UNWIND evalMetricIDs AS evalMetricID
MATCH (node2) WHERE node2.ID = evalMetricID
MERGE (node1)-[:HAS_EVAL_METRIC]->(node2)
} IN TRANSACTIONS OF 1000 ROWS