Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 1: replicate users.

how to calculate similarity based on some properties in neo4j?


I have a node that is Person, which has properties like: (age, address, school, pet's name). The node has a relationship with another node Vacation. I want to find similar nodes to a Person based on only 3 properties: person's age, address and count(Vacation). For example if I select Person A, based on his/her age, address and count(Vacation), 3 other (similar) person will return on screen.

I'm looking for a cypher query thst will help me. I have searched for the documentations and multiple examples but don't really know how to achieve that. Any recommendation will ve very helpful. Thank you.



Do you have a definition of what you consider similar? If so, we can try to write a query to find nodes that meet your similarity criteria. 

man alernative is the neo4j graph data science library. It has node similarity algorithms. One is K nearest neighbor. It only supports numeric properties. Also, it is not a real time analysis.  You project your graph and run the algorithm on the projected graph to find the results.

Hi, thank you for your answer. Yes, the nodes are similar as compared to age, address and count(Vacation). I have come up with a solution which does not return the result but yet I'll share:    

MATCH (p:Person {PersonName: 'Vulsini'})-[:VAC_DETAILS]->(vacation)
WITH p, collect(id(vacation)) AS p1v1
MATCH (v2:vacation)-[:VAC_DETAILS]->(vacation2) WHERE v <> v2
WITH v, p1v1, v2, collect(id(vacation2)) AS p2v2
RETURN v.vacationPlace AS from, v2.vacationPlace AS to,
gds.similarity.jaccard(p1v1, p2v2) AS jaccard;

Thank you for sharing. A couple comments:

1) where is variable ‘v’ defined used in your ‘where’ clause? 
2) aren’t people related to vacation via the VAC_DETAIL relationship?  Your second match is relating two vacation nodes via that relationship?  Is that valise?

3) you have converted the list of vacations to a list of integers so you can compare nodes via the Jaccard measure, but that can’t give you a useful metric, as the id’s are assigned and don’t convey  any information about the vacation.