How to find the similarity between common nodes of multiple type nodes?

vishnuvardhans1698 · February 4, 2020, 9:21am

So i'm using the "PIMA INDIANS DIABETES DATASET".
And i made these type nodes:

Person {age,id}
2.BMI {bmi level}
3.OUTCOME {outcome}
3.Blood Pressure {Blood Presseure level} and so on.....

I want to find the similarity between all the persons whose age is between 21-25 and who have been diagnosed with diabetes.
I want my answer something like this:
BMI similarity: 0.82
BP similarity: 0.67
I have seen all the graph algorithms but i didn't find anything relevant.
can we achieve this using Neo4j?

Ps. All the examples i have seen uses similarity between same type of relation.

nsmith_piano · February 4, 2020, 7:05pm

Welcome to the community. Can you describe more of what you mean by similarity or give a link to an example you have considered?

vishnuvardhans1698 · February 5, 2020, 5:06am

Ok so i have a dataset which contains the following columns.

1.ID
2.AGE
3.BMI
4.BP
5.INSULIN
6. OUTCOME

I Took Id as a node and added age as it's property.
Then i made separate nodes of all other columns like BMI, BP, INSULIN etc.
I have made relationship such that each "ID" has nodes to connected to their "BMI", "BP","INSULIN" etc values.

Now my query is this:
"Find the mean of BMI of all the persons whose age is <=25?"

Is creating a dedicated node set for each column the most efficient way?

alicia.frame · February 6, 2020, 3:38am

Our node similarity algorithm calculates the similarity of nodes based on their neighboring nodes (think of a (:Person)-[:LIKES]->(:Instrument) graph -- we measure how similar Person nodes are based on the number of the same Instruments they like vs. the number of different ones.

If you wanted to use that algorithm, you would need to the things you want to measure similarity on (eg. outcomes) into nodes. If you have a schema where Person is a node label with age and id attributes, and Outcome is a node label with a description attribute you could use nodeSimilarity in this way:

CALL algo.nodeSimilarity.stream(
     'MATCH(p:Person) WHERE p.age < 25 RETURN id(n) as id', 
     'MATCH (p:Person)-[:HAS_OUTCOME]->(o:Outcome) RETURN id(p) as source, id(o) as target',
{graph:'cypher')

In your reply to @nsmith_piano, you're asking about a mean value. Check out our documentation on aggregating functions here: Aggregating functions - Cypher Manual .

vishnuvardhans1698 · February 6, 2020, 9:36am

Thanks for the info. As you mentioned, the node similarity calculates similarity for only one type of relationship "LIKES" in your example. Like 'A like guitar and piano", "B likes keyboard and guitar". So they are 50% similar. What i want is "A likes guitar and lives at London", "B likes piano and lives at Mumbai"., so "A and B are 50% similar as they like same instrument but stay at different place. I know we can do this by measuring similarity to relation "LIKES" once, and then with "LIVES" once. But what if i want to compare using two relations at the same time? Btw, sorry if i framed the question wrong. I was just confused.

alicia.frame · February 6, 2020, 3:21pm

You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.

For the musical intrument example, if we add in a Place node and a LIVES_IN relationship, you could use a cypher projection like this:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id', 
     'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})

vishnuvardhans1698 · February 7, 2020, 6:39am

Solved my issue.Thanks a lot!

pooja.bumb · February 10, 2020, 4:16pm

Hey Alicia, great solution!
How can we return the node label instead of node id?

alicia.frame · February 10, 2020, 5:10pm

You can use the asNode function -- in the YIELD statement, return the nodeId, and then you can use algo.asNode to access labels and attributes. For example:

CALL algo.nodeSimilarity.stream('Person | Instrument', 'LIKES', {
  direction: 'OUTGOING'
})
YIELD node1, node2, similarity
RETURN algo.asNode(node1).name AS Person1, algo.asNode(node2).name AS Person2, similarity
ORDER BY similarity DESCENDING, Person1, Person2

khshah_m18 · April 14, 2020, 4:42pm

I have similar question.
How can we apply node similarity based on edge property value?
I have graph in which stock names are node.
Dates are node.
And price links node with dates.
So how to apply node similarity for different stocks?

mangesh.karangutkar · May 29, 2020, 11:35am

Hi Alicia

I think nodeSimilarity is now deprecated, I tried to run this cypher projection with jaccard similarity but i get an error "Procedure call does not provide the required number of arguments: got 3 expected 2."

alicia.frame · June 10, 2020, 4:15pm

@mangesh.karangutkar Node Similarity has not been deprecated: Node Similarity - Neo4j Graph Data Science

The error message you received from jaccard indicates that you've provided more inputs that it expects. The jaccard function expects a pair of inputs (the two nodes being compared); perhaps that's the issue. I would look to the docs for more information on the syntax: Similarity functions - Neo4j Graph Data Science

santhoshiindrakanti · February 3, 2021, 2:46pm

Please Could you help me...how to find out node similarity algorithm between nodes without relationships.

Thank you .

sam_gijare · February 3, 2021, 3:38pm

Nodes can then be just considered as classes the way we treat them in OOPS. You can write your own algorithms either for finding or comparing similarities between two classes/nodes.

But then that's your design and you need to tailor the algorithm as per your needs. If you need more help you need to be more verbose/specific on what exactly you want

Thanks
Sameer

luiseduardo · April 6, 2021, 6:52pm

Hi Alicia! I have done this in a similar structure, but the algorithm takes too long. I'm using four labels of nodes (Client and data from them: range of income, age, business line, etc.) and three types of relationships in a named graph (using gds), what could be happening?

alicia_frame1 · April 6, 2021, 7:35pm

If you don't have any relationships, you'd need to use node properties to calculate similarity on - check out KNN or Cosine Similarity. Those can create relationships between nodes that have similar properties, but no relationships.

alicia_frame1 · April 6, 2021, 7:36pm

Can you share your code? And how many nodes / relationships are in your graph?

luiseduardo · April 6, 2021, 7:43pm

Of course! Thank you!

Here I create the named graph and execute the algorithm:

--Client job graph
CALL gds.graph.create("client-job-graph", ["Client", "BusinessLine", "EconomicActivity","MonthlyIncome"],
["HAS_BUSINESS_LINE", "HAS_ACTIVITY", "HAS_MONTHLY_INCOME"]) YIELD nodeCount, relationshipCount;

CALL gds.nodeSimilarity.write("client-job-graph", {
    writeRelationshipType: "SIMILAR_J",
    writeProperty: "score_j",
    degreeCutoff: 3,
    topK: 5
})

In the named graph there's 528,739 nodes and 1,586,139 relationships, almost all of the nodes are Client nodes, since the other ones are sort of categories.

alicia_frame1 · April 9, 2021, 7:08pm

Does it execute too slowly? Or not at all? Usually the first thing I recommend is adding a degree cutoff and setting topK, but you've done that already.

You can take a peek at the debug logs to check on progress - as NodeSimilarity runs, it will print the percentage of each stage that's complete.

One thing you can try is to first run WCC on your client-job-graph and then run node similarity on individual components - this breaks the problem up and makes it much faster.

luiseduardo · April 12, 2021, 8:31pm

It executes slowly, it does finish but after an hour or hour and a half. I will be trying your suggestions and comment on the results, thanks a lot, Alice!

Topic		Replies	Views
Calculating Similarity of Nodes based on relations Neo4j Graph Platform cypher , data-science	1	322	April 18, 2022
Graph Data Science Library: Jaccard similarity Graph Data Science / Graph Analytics	2	814	April 20, 2020
Find similarity between two node clusters that are not connected Graph Data Science / Graph Analytics	3	533	May 13, 2023
Finding Ad-hoc Similarity of a Node or set of nodes (without running similarity algorithm on all the nodes) Graph Data Science / Graph Analytics	1	404	July 27, 2020
Node Similarity Algorithm for second and third level relationships comparison Graph Data Science / Graph Analytics	5	1340	May 19, 2020

How to find the similarity between common nodes of multiple type nodes?

Related topics