I'm interested in creating relationships between nodes of the same label based on certain conditions. The data is loaded from a CSV with sample values below:
activity process processID linked_processID
submit budget doc review aa100 bb100
determine funding budget bb100 NULL
conduct survey survey cc100 NULL
assess eligibility review dd100 aa100
I've used the following statement to merge the process nodes:
LOAD CSV...
WITH LINE
MERGE (p:Process {name: line.process, id:line.processID, link:line.linked_processID})
Now for those process nodes that are linked to another process, I want to create a relationship between them. Cypher doesn't really support the following syntax, but I'm sharing it to approximate my logic:
WHERE p.linked_processID IS NOT NULL
WITH LINE
MERGE (p id:line.processID) - [:linked_to] -> (p id:line.linked_processID)
That is, is there a way in cypher to iterate through the CSV and conditionally create relationships from p nodes where id in the first node is equivalent to a certain value and where id in the second node is equivalent to a different value?
Yes, you can do this, and your logic is fairly close.
You may actually want to make two passes over your CSV, the first to just MERGE the :Process nodes, but you'll want to ONLY merge on the unique properties for that node (you should have a unique constraint on :Process(id)), then use ON CREATE SET to set the remaining properties:
LOAD CSV...
MERGE (p:Process {id:line.processID})
ON CREATE SET p.name = line.process
Your second pass will be matching on :Processes with then given processID and linked_processID and creating the relationship between them:
LOAD CSV...
MATCH (p1:Process {id:line.processID}), (p2:Process {id:line.linked_processID})
MERGE (p1)-[:linked_to]->(p2)
There's no need to write out the linked_processID as a property, as that's only needed for relationship creation and the second pass over the CSV will handle that.
Thank you - I tried the code you recommended on a miniature dataset and it worked! Much appreciated.
Another question came to mind that I didn't think of previously. What if some processes link multiple times to another processes - is there a way to set a property of the [:linked_to] relationship equivalent to the # of times that a process links to another process?
Thanks.
Yes, you can use ON CREATE SET and ON MATCH SET after a MERGE to have conditional setting of properties based upon whether the MERGE resulted in the creation of a relationship or matched to an existing one.
So you could have:
...
MERGE (p1)-[r:linked_to]->(p2)
ON CREATE SET r.times = 1
ON MATCH SET r.times = coalesce(r.times, 1) + 1
The coalesce() here is to cover for existing relationships that don't have this properly, it allows you to have a default of 1 in that case for the purpose of the calculation (though you won't need this if this is the first time these relationships are being created).
Thanks for sharing that, I added that to my code and got it to work.
One last question for this thread: How would I factor the r.times into a query so that I can get downstream impacts of that value in the query results? For instance, if Process A is linked two times to Process B, how would I query so that I can get, say, a count of all child nodes in Process A and B (knowing B's child nodes should count twice)?
Thank you again!
You could do something like:
...
WITH child, sum(coalesce(r.times, 1)) as count
...
HI
I am new to neo4j. Trying to load CSV and create relationship between nodes of same label.
Using the queries mentioned in this post;
LOAD CSV...
MERGE (p:Process {id:line.processID})
ON CREATE SET p.name = line.process---first pass creates the nodes succesfully.
LOAD CSV...
MATCH (p1:Process {id:line.processID}), (p2:Process {id:line.linked_processID})
MERGE (p1)-[:linked_to]->(p2)--- second pass returns no changes no records
Please advise. I tried using the same dataset in the post for CSV
The queries look fine, but you need to make sure you've created an index or unique constraint on :Process(id) to support quick lookup on the MATCH and MERGEs for the nodes.
As for why nothing is changing, that's likely due to unexpected whitespace when parsing your CSV.
Try:
LOAD CSV...
RETURN line
LIMIT 5
And check for beginning or trailing whitespace, especially for the processID and linked_processID fields.