Relationship between nodes of same Label

Hi,

I am completely new to GraphDB and just getting my feet wet. My understanding about GraphDB as of now is that, it doesn't have a defined schema and hence it cannot have the Schema level relationship. In GraphDB, we must create relationship between nodes explicitly if we want to. GraphDB eliminates the need for Primary key/Foreign Key constraints and also the JOINs in the query and it stores the relationship of each node separately and hence it is fast(I assume so).

I have a scenario where I need to have a relationship between different nodes of the same label for the same columns. (Eg. Same IP Address). In this case, I assume we need to first create the nodes and then explicitly create relationship between the nodes that has same IP addresses. So, if a node n1 (that is getting created) has IP as 1.1.1.1 and there already exists around 50 nodes with the same IP(1.1.1.1) under the same label, then, my understanding is that, we need to explicitly create relationship between n1 and all the 50 nodes. This will be the way to maintain the relationship as and when a new node is created in a OLTP system(may be MERGE command will be the better option as it will skip the already existing relationships and create new ones only for the new record)

It would be great if anyone can clarify / confirm my above understanding with some sample (if any).

Hi Raja,

welcome to the Community! I hope you enjoy playing around with Neo4j.

Concerning your problem:
I think the questions is: "where do you want to go with your relationships?"

Say there are 50 nodes that have the same IP, do you really want to create 50*(50+1)/2 = 1275 relationships such that every node is connected to every node?

I think what you should rather do is create a node for the IP-Address and connect all the nodes that have this addres to the IP-Address node.

Let's make an example. Say you create 3 nodes like
CREATE (:NodeLabel {name: "1", ipaddress: 1.1.1.1}), (:NodeLabel {name: "2", ipaddress: 1.1.1.1}), (:NodeLabel {name: "3", ipaddress: 1.1.1.2})

You can then create IP-Address nodes by saying
MATCH (a:NodeLabel)
WITH DISTINCT a.ipaddress as differentIPS
CREATE (:IpAddress {address: differentIPS})

No let's connect the nodes:
MATCH (a:NodeLabel), (b:IpAddress) WHERE a.ipaddress = b.address
CREATE (a)-[:HAS_IP_ADDRESS]->(b)

You end up with:
image

Does that help you? Let me know. I am happy to provide more input.
Regards,
Elena

Hi Elena,

Many thanks for your detailed response. It definitely helps. Your suggestion makes lot of sense.

Really appreciate your support.

Regards,

Raja PV

Hi Elena and Neo4j Community,

I'm new to this forum, so I hope I'm following appropriate etiquette here...just wanted to ask a follow-up question on the same topic.

Unlike the IP Address scenario, I am working on a graph that will display software applications in a large ecosystem (each app will be a node of the same label with a different property 'name') and I wish to show integrations between apps where they exist. I'd prefer not to create a separate node for each integration itself, but would rather simply show an 'IS INTEGRATED WITH' relationship between the given applications. I know I could probably add an 'integration ID' in each line of the CSV along with its primary 'app id' to show how one app links to another; however, in some cases, a given application will be integrated to four others...other times, only a single integration would exist. Is it possible to account for this in Neo4j? If so, could you please suggest a strategy to accomplish it?

Thank you for the help.

Jim

Hi Jim,

welcome to the community!

I agree with you. If you really want to show relationships between the nodes, then it is better not to have a separate node for that.

I had a similar problem that something could be "no link", "one link" or "several links" and had a hard time modeling it. Here is how I solved it:

In my csv file I added a column, say 'integration_ID' (the column headers should not have spaces). If it had "no link", I inserted an 'o' (or any other sign signaling that there is nothing). If it had one or more links, I would simply separate the different ids by a semicolon. This allows you to put in as many links as you want.
When reading in the property, you can read the column in as an array using the split function:

" LOAD CSV WITH HEADERS FROM 'file:///example.csv' as row
MERGE (n:Node {integration_ID: CASE WHEN NOT row.itegration_ID = 'o' THEN split(row.itegration_ID,';') END})"

When making the connections you can then simply unwind (the function is actually called "UNWIND"):

MATCH (n:Node) WHERE exists(n.integration_ID)
UNWIND n.integration_ID AS ID
MATCH (m:Node {id: ID})
CREATE (n)-[:IS_INTEGRATED_WITH]->(m)

or something similar. I hope you get the idea.

Regards,
Elena

Hi Elena,

Thanks so much for your response. Being fairly new to Neo4j, I was not yet familiar with 'SPLIT' and 'UNWIND'. So, I get the concept, but I'm struggling with one bit of detail. In the 'connections' segment, you refer to a Node n and a Node m. In my case, I only have one node 'a' (a:App). I'm just wondering if you can explain the two nodes a bit more. Is one just sort of a temporary placeholder? When I run it on my end, I get a 'no changes, no records' response. I'm wondering if the single node type essentially needs to create a relationship with itself. However, I assume I'm still missing a bit of understanding here.

Thanks again for your help.

Jim

Hi Jim,

I think it would be much easier for me to help if you could provide a small example csv. However in general, what do you want to connect to? As I understood it, you want to connect a certain node n which has several entries in its "connections" attribute to OTHER nodes m1,m2,... that the node n is integrated with.

As I said, if you provide a small example, we will have it fixed very quickly ;-).

Regards,
Elena

image

Hi Elena,

Yes, of course, I have attached an image sample of a file I am using: AppDetail.csv. In this, I am listing individual applications with IDs such as A01, A02, etc. App names are in the 'AppName' column. (ignore the PID, FLID, and SOID columns for now as they are not relevant to this topic). Per your suggestion, I added an 'Integration_ID' column and have populated it with application IDs from other rows. For example, app A01 is integrated to app A02. App A02 is integrated with apps A03 and A04. So, to summarize, I only have one node type that I am using (a:App). So, if I've gotten this format right, it seems that I need to essentially have an app node create a relationship with other app nodes. This is why I was confused when you referenced two different node designations above (n and m). I hope this makes sense and thanks for your continued patience and support.

Jim

Hi Jim,

the csv looks good! It is exactly what I expected.

The "m" and "n" in my code does not mean that you need to have two different node labels. It is ok only to have one. But if you want to e.g. connect the first node A01 to the second node A02 you need to connect two nodes for which you need variables, hence in my example "n" and "m". So with your csv the following should work to load in and connect the App nodes:

LOAD CSV WITH HEADERS FROM 'file:///AppDetail.csv' as row
CREATE (n:App {AID: row.AID, AppName: row.AppName, Integration_ID: CASE WHEN NOT row.Integration_ID= 'o' THEN split(row.Integration_ID,';') END})

MATCH (n:App) WHERE exists(n.Integration_ID)
UNWIND n.Integration_ID AS ID
MATCH (m:App {AID: ID})
CREATE (n)-[:IS_INTEGRATED_WITH]->(m)

To explain more: the first line here finds the nodes (n) which have the attribute "Integration_ID", hence A01 and A02. Then for every node, the list in Integration_ID is unwound. But then the node (m) that has that particular ID needs to be found in order to connect to. So, for the node A01 (n) the node A02 (m) is found and the relationship is created. And for the node A02 (n), first the node A03 (m) is found and connected to and then the node A04 (m) is found and connected to. The unwind gives you more or less a for loop here.

After the two commands you obtain:
image

Cheers,
Elena

1 Like

Hi Elena,

Thanks for that detailed explanation! This makes perfect sense to me now and the code works great. You've taught me a lot in a short period of time, thanks so much for your support.

Cheers.

Jim

1 Like

Hi again Elena,

I'm wondering if I can bother you one more time. I am quite close to the solution I am striving for, but am stuck on (hopefully) one last thing. Since our last interaction, I have leveraged the APOC plug-in to dynamically pass in 'Data_Passed' in a given integration. Updated source file is shown below:
Updated AppDetail

So, with the addition of the Data_Passed column, I have created a second array similar to what you suggested previously. The new upload script I am using is below:

CREATE (n:App {AID: csvLine.AID, AppName: csvLine.AppName, pid: csvLine.PID,
flid: csvLine.FLID, soid: csvLine.SOID,
Integration_ID: CASE WHEN NOT csvLine.Integration_ID='o'THEN split(csvLine.Integration_ID,';') END,
Data_Passed: CASE WHEN NOT csvLine.Data_Passed='o'THEN split(csvLine.Data_Passed,';') END})

The data loads fine. However, when I use the code shown below to unwind both of the arrays...

MATCH (n:App) WHERE exists(n.Integration_ID)
UNWIND n.Integration_ID AS ID
UNWIND n.Data_Passed AS DP
MATCH (m:App {AID: ID}) WHERE DP<>'o'
CALL apoc.create.relationship(n, DP, {}, m) YIELD rel
RETURN n,m

...I get a graph that looks like the following:

I think my use of UNWIND causes a Cartesian product and gives me unwanted results. The integration on the right is fine. However, on the left, I just want ChemReg to D360 to pass 'Data1' and ChemReg to AssayReg to pass 'Data2'. I tried using FOREACH, but it didn't like the Call to APOC inside that block. Is there a better way to use UNWIND or another method?

Thanks again for the help. I'm almost there. :slight_smile:

Jim

Hi Jim,

yes, your two unwind statements work like a nested for loop where the statement within is then executed four times in total.

What you really want is to loop ONCE over BOTH the Integration_ID and the Data_Passed array for every node that has these attributes.

If you know for sure that both arrays will have the same length (which I assume or you should look after), then you can simply loop over the number of entries for every loop (e.g. A01 has only 1 entry and A02 has two entries). The following will help you:

MATCH (n:App) WHERE exists(n.Integration_ID)
UNWIND range(0,length(n.Integration_ID)) as element
MATCH (m:App {AID: n.Integration_ID[element]})
CREATE (n)-[:IS_INTEGRATED_WITH {dataPassed:n.Data_Passed[element]}]->(m)

This yields:
image

Why are you using the apoc for creating relationships? I saw that it creates variable relationship types. Do you really want that? I would prefer having the same type with an attribute that varies... on my first try the apoc did not work with my code. Probably you could however trick it by putting a "WITH"-statement in.

Cheers,
Elena

Hi Elena,

Thank you for that. Interestingly, I was just exploring the range() function as a possibility. This is a really elegant solution.

I leveraged apoc because I'd read in a couple of posts that it wasn't possible to create dynamic (parameterized) relationships with basic (non-apoc) Neo4j Cypher. Glad to know that is not correct!

Thanks again for the help, much appreciated.

Jim

Well, the statement that it only works with the APOC to produce parameterized relationship types is probably correct. The question is if you would want to do that or not. For me, it just doesn't feel right to have a relationship type for every distinct entry of that second array...
You could open a new topic in the forum to discuss this. I would also be interested in other people's opinions.

Thanks for that added context, Elena. I think in my relative newbie-ness with Neo4j/Cypher, I didn't understand the difference between a relationship type and one that just varies by attribute. I'll look to open a new topic shortly to get folks' thoughts on this.

Thanks again for all the help.

Jim P.S. the apoc code did/does work for me on my machine; however, I'll modify my code with your solution as it's better. :)

Hi,Elena.
I am building a graph to show the dependency relationships between different artifacts in maven central repository.
Firstly, I loaded a CSV like below and named the label as Artifact :


first cypher is:

load csv with headers from 'file:///test_header.csv' as art
create (:Artifact {gav: art.gav, groupId: art.groupId, artifactId: art.artifactId,
                   version: substring(art.version,2), packaging: art.packaging})

Then,I want to load a CSV to create relationships between nodes with same labels.
depend_test

the cypher is like this:

load csv with headers from 'file:///test_depend.csv' as art
MATCH (a:Artifact {gav: art.from}) , (b:Artifact {gav: art.to}) create (a)-[r:DEPEND_ON]->(b)

But, it returned a warning, and no relationship built.

This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (b))

I also tried this one, no result too.

LOAD CSV WITH HEADERS FROM "file:///test_depend.csv" AS row
MATCH (f:Artifact {gav: row.from})
MATCH (t:Artifact {gav: row.to})
MERGE (f)-[:DEPEND_ON]->(t);

So, my question is How can I load the CSV correctly and build the relationship between these nodes with same labels?

Hi jayxu688,

your queries look good. The only problem is - I guess - the size of you dataset. That is also why I couldn't recreate your error / warning since you queries work on a small dataset.

Maybe this might help you: https://stackoverflow.com/questions/33352673/why-does-neo4j-warn-this-query-builds-a-cartesian-product-between-disconnected

If it does not help you, please consider creating a completely new forum entry. The new entry will raise more awareness of other people who might have had similar issues before.

Regards,
Elena

Most likely reasoning would be that the data is different some way between 2 CSV's. Simplest error would be there is a space in the names some where in the data set. Either in the first CSV or in the second CSV. Check if that is the case.

If either of the MATCH statements doesn't return a node it will not create a relationship.

Yes, you are right.I find the white space is the problem.
After I deleted the space, created index on the "gav" property, and "USING PERIODIC COMMIT" before the cypher, the relationships were created quite fast.
Thanks for your replay.

Regards,
Jay

Hi Elena,
Thanks for your reply.
Turns out there is white space in the cell of the CSV file of the "gav" property.
After I deleted the space, created index on the "gav" property, and "USING PERIODIC COMMIT" before the cypher, the relationships were created quite fast.

Regards,
Jay