Avoiding duplicate Links

trevor.miles · January 4, 2020, 2:37am

I have a bunch of connected activities (nodes) in CSV format, one file containing the activities, and another the connections. I have no problem creating the nodes, but I just cannot get the links created without duplicates.

Activities/Nodes

ResourceName	Min	Mode	Max
Case_Start	0	0.3	9.8
Create_Delivery	0	0	0
Create_Quotation	0	0	0
Create_Sales_Order_Item	0	0	0

Cypher to create the nodes:

LOAD CSV WITH HEADERS FROM "file:///ACTIVITIES_O2C.csv" AS row
CREATE (a:Activity {Name:row.ResourceName, Min: toFloat(row.Min), Mode: toFloat(row.Mode), Max: toFloat(row.Max), Cost: toFloat(row.CostRate)})

Connections/Links

ConnectorName	StartingActivity	EndingActivity	LinkProbability	Min	Mode	Max
Case_Start::Create_Sales_Order_Item	Case_Start	Create_Sales_Order_Item	70.38	0.00	0.00	0.00
Case_Start::Create_Delivery	Case_Start	Create_Delivery	24.77	0.00	0.00	0.00
Case_Start::Create_Quotation	Case_Start	Create_Quotation	4.84	0.00	0.00	0.00

Cypher to create the links:

LOAD CSV WITH HEADERS FROM "file:///CONNECTIONS_O2C.csv" AS row
MATCH (lft { Name: row.StartingActivity })
MATCH (rgt { Name: row.EndingActivity })
MERGE (lft)-[:FEEDS { Likelihood: toFloat(row.LinkProbability), Min: toFloat(row.Min), Mode: toFloat(row.Mode), Max: toFloat(row.Max) }]->(rgt)

The issue is that I get links created between all the nodes and even some circular links.

I know I'm doing something incorrectly. Just need help with the correct Cypher.

mike_r_black · January 4, 2020, 3:33am

Try this for creating the links:

LOAD CSV WITH HEADERS FROM "file:///CONNECTIONS_O2C.csv" AS row
MATCH (lft { Name: row.StartingActivity })
MATCH (rgt { Name: row.EndingActivity })
MERGE (lft)-[r:FEEDS]->(rgt)
SET r.Likelihood = toFloat(row.LinkProbability), 
r.Min = toFloat(row.Min), 
r.Mode = toFloat(row.Mode),
r.Max = toFloat(row.Max)

The MERGE clause takes the whole statement into account to determine if a match is made. I suspect that as your data loads, there are multiple entries in your connection file that would map a start node to an end node. In the cypher I wrote, it would look up to see if there's already a relationship between the two nodes and if there is, it's going to update the relationship instead of creating a second relationship.

As far as the circular paths, I would validate the source data again. The fact that you're getting multiple relationships between nodes and circular paths, I would double check how the CSVs are being generated.

trevor.miles · January 4, 2020, 4:19am

Thanks Mike, but that doesn't solve the problem. And, no, I do not have duplicates in the data file. If after loading the data I run the Cypher statement

MATCH p=(:Activity [Name: 'Case_Start'])-[r:FEEDS]->() RETURN p

I get the following result

trevor.miles · January 4, 2020, 4:32am

Actually, turns out I was wrong all this time. The graph created is perfectly correct. Thanks for the challenge Mike.

Topic		Replies	Views
Creating relationships conditionally among nodes of same label Cypher cypher	7	8471	December 18, 2020
Duplicated relationships for same nodes when loading from CSV Newbie Questions	2	1225	February 17, 2019
Python loading script creates duplicate nodes when creating relationships using 'MERGE' Cypher cypher	1	615	February 25, 2020
Cypher Statement Not giving Desired Result in Neo4j? Cypher cypher , neo4j-desktop	6	305	April 27, 2021
Connecting one node to multiple Cypher	7	541	July 30, 2021

Demystifying Neo4j UX Research

Avoiding duplicate Links

Related topics