Multiple Nodes in 1 column as target

Hi,

image
The CSV shown above is the result of a manual manipulation from the original csv.

I then proceed to enter the following cypher.
LOAD CSV WITH HEADERS FROM 'file:///Data_3.csv' AS row
MERGE (m:Module {ID: row.ID, RefBy1:COALESCE(row.RefBy1,"NA") , RefBy2: COALESCE(row.RefBy2,"NA")})
return m

Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy1
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent

Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy2
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent

The original CSV file is
image

Do you have any suggestion how i can create a single script to create the same model i showed above. I have tried using UNWIND, as shown below but nothing gets created
LOAD CSV WITH HEADERS FROM 'file:///Data_1.csv' AS row
WITH row.ID AS impID, SPLIT(row.RefBy, ",") AS multiRef
UNWIND multiRef as impRef
MATCH (LI:LineItem {LIName: impID})
MATCH (RB:ReferenceBy {RBName:TRIM(impRef)})
MERGE (LI)-[rel:RefBy]->(RB)
return LI, RB

Any suggestions? Thanks in advance.

Cheers,
LC

The model should look like this:

Try this. It creates the nodes first, then processes the file again to create the relationships.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
UNION ALL
load CSV with headers FROM 'file:///Data_1.csv' AS row
with row
where row.RefBy is not null
match(n:Module{ID:row.ID})
with n, split(row.RefBy, ",") as references
unwind references as reference
match(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)

Test Data:
Screen Shot 2022-02-08 at 2.16.26 PM

Result:

Hi @glilienfield

Your solution works. I do have a follow up question, i.e. why is UNION ALL required? Is there a way where both row and reference are created from a single LOAD?

Thanks

This seems to work too.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}

1 Like

Thanks @glilienfield
I'll try this too

hi @glilienfield

To make things even more complicated, how would you utilize the 'Value' column, such that the script will return the path with the largest aggregated value (should be A > C > D, because A = 2, C = 10, D = 3 for a total of 15)

image

Thanks,
LC

First update the query to consume the value attribute. Store it as an attribute of the node.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID, value:toInteger(row.value)})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}

You can then sum up the values along each path using the 'reduce' operation on the collection of nodes along each path, calculating the path's rank metric.

match p=(n:Module)-[:RefBy*]->(m:Module)
with nodes(p) as nodes
return [i in nodes | i.ID] as pathNodes, reduce(s=0, i in nodes | s + i.value) as rank
order by rank desc
limit 1

Screen Shot 2022-03-01 at 9.01.25 AM

Note, the query does find intermediate paths, i.e. A->C. You can add a constraint to remove these if they are not needed and you only want full paths. I did not do so, since they would never have the largest rank since your values are all positive. Let me know if you may have negative values and want to eliminate these as a potential result.

You can run the queries separately or concatenate them to achieve the result in one operation.

1 Like

Thanks for your very fast response @glilienfield , i'll try it out.