cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 1: replicate users.

how to calculate similarity based on some properties in neo4j?

HasAngi
Node

I have a node that is Person, which has properties like: (age, address, school, pet's name). The node has a relationship with another node Vacation. I want to find similar nodes to a Person based on only 3 properties: person's age, address and count(Vacation). For example if I select Person A, based on his/her age, address and count(Vacation), 3 other (similar) person will return on screen.

I'm looking for a cypher query thst will help me. I have searched for the documentations and multiple examples but don't really know how to achieve that. Any recommendation will ve very helpful. Thank you.

3 REPLIES 3

glilienfield
Ninja
Ninja

Do you have a definition of what you consider similar? If so, we can try to write a query to find nodes that meet your similarity criteria. 

man alernative is the neo4j graph data science library. It has node similarity algorithms. One is K nearest neighbor. It only supports numeric properties. Also, it is not a real time analysis.  You project your graph and run the algorithm on the projected graph to find the results. 

https://neo4j.com/docs/graph-data-science/current/algorithms/knn/

Hi, thank you for your answer. Yes, the nodes are similar as compared to age, address and count(Vacation). I have come up with a solution which does not return the result but yet I'll share:    

MATCH (p:Person {PersonName: 'Vulsini'})-[:VAC_DETAILS]->(vacation)
WITH p, collect(id(vacation)) AS p1v1
MATCH (v2:vacation)-[:VAC_DETAILS]->(vacation2) WHERE v <> v2
WITH v, p1v1, v2, collect(id(vacation2)) AS p2v2
RETURN v.vacationPlace AS from, v2.vacationPlace AS to,
gds.similarity.jaccard(p1v1, p2v2) AS jaccard;

Thank you for sharing. A couple comments:

1) where is variable ‘v’ defined used in your ‘where’ clause? 
2) aren’t people related to vacation via the VAC_DETAIL relationship?  Your second match is relating two vacation nodes via that relationship?  Is that valise?

3) you have converted the list of vacations to a list of integers so you can compare nodes via the Jaccard measure, but that can’t give you a useful metric, as the id’s are assigned and don’t convey  any information about the vacation.