The CSV shown above is the result of a manual manipulation from the original csv.
I then proceed to enter the following cypher.
LOAD CSV WITH HEADERS FROM 'file:///Data_3.csv' AS row
MERGE (m:Module {ID: row.ID, RefBy1:COALESCE(row.RefBy1,"NA") , RefBy2: COALESCE(row.RefBy2,"NA")})
return m
Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy1
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent
Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy2
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent
The original CSV file is
Do you have any suggestion how i can create a single script to create the same model i showed above. I have tried using UNWIND, as shown below but nothing gets created
LOAD CSV WITH HEADERS FROM 'file:///Data_1.csv' AS row
WITH row.ID AS impID, SPLIT(row.RefBy, ",") AS multiRef
UNWIND multiRef as impRef
MATCH (LI:LineItem {LIName: impID})
MATCH (RB:ReferenceBy {RBName:TRIM(impRef)})
MERGE (LI)-[rel:RefBy]->(RB)
return LI, RB
Try this. It creates the nodes first, then processes the file again to create the relationships.
load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
UNION ALL
load CSV with headers FROM 'file:///Data_1.csv' AS row
with row
where row.RefBy is not null
match(n:Module{ID:row.ID})
with n, split(row.RefBy, ",") as references
unwind references as reference
match(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
Your solution works. I do have a follow up question, i.e. why is UNION ALL required? Is there a way where both row and reference are created from a single LOAD?
load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}
To make things even more complicated, how would you utilize the 'Value' column, such that the script will return the path with the largest aggregated value (should be A > C > D, because A = 2, C = 10, D = 3 for a total of 15)
First update the query to consume the value attribute. Store it as an attribute of the node.
load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID, value:toInteger(row.value)})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}
You can then sum up the values along each path using the 'reduce' operation on the collection of nodes along each path, calculating the path's rank metric.
match p=(n:Module)-[:RefBy*]->(m:Module)
with nodes(p) as nodes
return [i in nodes | i.ID] as pathNodes, reduce(s=0, i in nodes | s + i.value) as rank
order by rank desc
limit 1
Note, the query does find intermediate paths, i.e. A->C. You can add a constraint to remove these if they are not needed and you only want full paths. I did not do so, since they would never have the largest rank since your values are all positive. Let me know if you may have negative values and want to eliminate these as a potential result.
You can run the queries separately or concatenate them to achieve the result in one operation.