cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

How to find the similarity between common nodes of multiple type nodes?

vishnuvardhans1
Node Clone

So i'm using the "PIMA INDIANS DIABETES DATASET".
And i made these type nodes:

  1. Person {age,id}
    2.BMI {bmi level}
    3.OUTCOME {outcome}
    3.Blood Pressure {Blood Presseure level} and so on.....

I want to find the similarity between all the persons whose age is between 21-25 and who have been diagnosed with diabetes.
I want my answer something like this:
BMI similarity: 0.82
BP similarity: 0.67
I have seen all the graph algorithms but i didn't find anything relevant.
can we achieve this using Neo4j?

Ps. All the examples i have seen uses similarity between same type of relation.

1 ACCEPTED SOLUTION

You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.

For the musical intrument example, if we add in a Place node and a LIVES_IN relationship, you could use a cypher projection like this:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id', 
     'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})

View solution in original post

20 REPLIES 20

nsmith_piano
Graph Buddy

Welcome to the community. Can you describe more of what you mean by similarity or give a link to an example you have considered?

vishnuvardhans1
Node Clone

Ok so i have a dataset which contains the following columns.

1.ID
2.AGE
3.BMI
4.BP
5.INSULIN
6. OUTCOME

I Took Id as a node and added age as it's property.
Then i made separate nodes of all other columns like BMI, BP, INSULIN etc.
I have made relationship such that each "ID" has nodes to connected to their "BMI", "BP","INSULIN" etc values.

Now my query is this:
"Find the mean of BMI of all the persons whose age is <=25?"

Is creating a dedicated node set for each column the most efficient way?

Our node similarity algorithm calculates the similarity of nodes based on their neighboring nodes (think of a (:Person)-[:LIKES]->(:Instrument) graph -- we measure how similar Person nodes are based on the number of the same Instruments they like vs. the number of different ones.

If you wanted to use that algorithm, you would need to the things you want to measure similarity on (eg. outcomes) into nodes. If you have a schema where Person is a node label with age and id attributes, and Outcome is a node label with a description attribute you could use nodeSimilarity in this way:

CALL algo.nodeSimilarity.stream(
     'MATCH(p:Person) WHERE p.age < 25 RETURN id(n) as id', 
     'MATCH (p:Person)-[:HAS_OUTCOME]->(o:Outcome) RETURN id(p) as source, id(o) as target',
{graph:'cypher')

In your reply to @nsmith_piano, you're asking about a mean value. Check out our documentation on aggregating functions here: https://neo4j.com/docs/cypher-manual/current/functions/aggregating/ .

Thanks for the info. As you mentioned, the node similarity calculates similarity for only one type of relationship "LIKES" in your example. Like 'A like guitar and piano", "B likes keyboard and guitar". So they are 50% similar. What i want is "A likes guitar and lives at London", "B likes piano and lives at Mumbai"., so "A and B are 50% similar as they like same instrument but stay at different place. I know we can do this by measuring similarity to relation "LIKES" once, and then with "LIVES" once. But what if i want to compare using two relations at the same time? Btw, sorry if i framed the question wrong. I was just confused.

You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.

For the musical intrument example, if we add in a Place node and a LIVES_IN relationship, you could use a cypher projection like this:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id', 
     'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})

Solved my issue.Thanks a lot!

Hey Alicia, great solution!
How can we return the node label instead of node id?

Hi Alicia! I have done this in a similar structure, but the algorithm takes too long. I'm using four labels of nodes (Client and data from them: range of income, age, business line, etc.) and three types of relationships in a named graph (using gds), what could be happening?

Can you share your code? And how many nodes / relationships are in your graph?

Of course! Thank you!

Here I create the named graph and execute the algorithm:

--Client job graph
CALL gds.graph.create("client-job-graph", ["Client", "BusinessLine", "EconomicActivity","MonthlyIncome"],
["HAS_BUSINESS_LINE", "HAS_ACTIVITY", "HAS_MONTHLY_INCOME"]) YIELD nodeCount, relationshipCount;

CALL gds.nodeSimilarity.write("client-job-graph", {
    writeRelationshipType: "SIMILAR_J",
    writeProperty: "score_j",
    degreeCutoff: 3,
    topK: 5
})

In the named graph there's 528,739 nodes and 1,586,139 relationships, almost all of the nodes are Client nodes, since the other ones are sort of categories.

alicia_frame
Neo4j
Neo4j

You can use the asNode function -- in the YIELD statement, return the nodeId, and then you can use algo.asNode to access labels and attributes. For example:

CALL algo.nodeSimilarity.stream('Person | Instrument', 'LIKES', {
  direction: 'OUTGOING'
})
YIELD node1, node2, similarity
RETURN algo.asNode(node1).name AS Person1, algo.asNode(node2).name AS Person2, similarity
ORDER BY similarity DESCENDING, Person1, Person2

Hi Alicia

I think nodeSimilarity is now deprecated, I tried to run this cypher projection with jaccard similarity but i get an error "Procedure call does not provide the required number of arguments: got 3 expected 2."

khshah_m18
Node

I have similar question.
How can we apply node similarity based on edge property value?
I have graph in which stock names are node.
Dates are node.
And price links node with dates.
So how to apply node similarity for different stocks?

alicia_frame
Neo4j
Neo4j

@mangesh.karangutkar Node Similarity has not been deprecated: https://neo4j.com/docs/graph-data-science/current/algorithms/node-similarity/

The error message you received from jaccard indicates that you've provided more inputs that it expects. The jaccard function expects a pair of inputs (the two nodes being compared); perhaps that's the issue. I would look to the docs for more information on the syntax: https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/jaccard/

Please Could you help me...how to find out node similarity algorithm between nodes without relationships.

Thank you .

Nodes can then be just considered as classes the way we treat them in OOPS. You can write your own algorithms either for finding or comparing similarities between two classes/nodes.

But then that's your design and you need to tailor the algorithm as per your needs. If you need more help you need to be more verbose/specific on what exactly you want

Thanks
Sameer

If you don't have any relationships, you'd need to use node properties to calculate similarity on - check out KNN or Cosine Similarity. Those can create relationships between nodes that have similar properties, but no relationships.

Does it execute too slowly? Or not at all? Usually the first thing I recommend is adding a degree cutoff and setting topK, but you've done that already.


You can take a peek at the debug logs to check on progress - as NodeSimilarity runs, it will print the percentage of each stage that's complete.


One thing you can try is to first run WCC on your client-job-graph and then run node similarity on individual components - this breaks the problem up and makes it much faster.

It executes slowly, it does finish but after an hour or hour and a half. I will be trying your suggestions and comment on the results, thanks a lot, Alice!

dadokkio
Node

Hello, I have a strange use case but probably is related to this topic. If not, let me know if a new topic is needed.
I've some object Item (orange), String (blue) and Condition (grey) that defines my items.


The grey dots are the logical representation of the orange dots and are generated separately for each node because some nodes are very generic (or, and) and their position in the tree (that could be nested as required) is important.
The requirements is to find correlation between orange dots considering blue and the tree structure (and value) of grey ones. In this case blue are shared, so ok, while grey ones have different id but same value.
Using Jaccard the similarity between them is 0.875 (I image because only the root of the tree is considered) while should be 1.0.
Example code I'm actually using:

## DEMO OBJ
MERGE(it1:Item {name: "AAA"})
MERGE(it2:Item {name: "BBB"})

MERGE(st1:String {value: "stringAAA"})
MERGE(st2:String {value: "stringBBB"})
MERGE(it1)-[:uses]->(st1)
MERGE(it2)-[:uses]->(st1)
MERGE(it1)-[:uses]->(st2)
MERGE(it2)-[:uses]->(st2)

MERGE(p1:Part {id: 1, type: "And", value: "-"})
MERGE(sp11:Part {id: 2, type: "All", value: "-"})
MERGE(sp12:Part {id: 3, type: "Set", value: "-"})

MERGE(p1)-[:then]->(sp11)
MERGE(p1)-[:then]->(sp12)

MERGE(p2:Part {id: 4, type: "And", value: "-"})
MERGE(sp21:Part {id: 5, type: "All", value: "-"})
MERGE(sp22:Part {id: 6, type: "Set", value: "-"})

MERGE(p2)-[:then]->(sp21)
MERGE(p2)-[:then]->(sp22)

MERGE(it1)-[:from]->(p1)
MERGE(it2)-[:from]->(p2)


## CREATE GRAPH
CALL gds.graph.create(
    'myGraph',
    ['Item', 'String', 'Part'],
    {
        uses: {
            type: 'uses'
        },
        from: {
            type: 'from'
        },
        then: {
            type: 'then'
        }
    }
);

## SHOW SIMILARITY
CALL gds.nodeSimilarity.stream('myGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Item1, gds.util.asNode(node2).name AS Item2, similarity
ORDER BY similarity DESCENDING, Item1, Item2

## WRITE SIMILARITY BACK
CALL gds.nodeSimilarity.write('myGraph', {
    writeRelationshipType: 'SIMILAR',
    writeProperty: 'score'
})
YIELD nodesCompared, relationshipsWritten

How do you suggest to approach this use case?

Thanks