How to find similar clients?

Gosforth · February 22, 2022, 1:51pm

I wonder if I can use GDS library to find similar clients; clients who have bought similar products. Perfect if I could find similarity where base is similar variety of products AND not only product types but also quantity of these products.

My sample data:

CREATE
  (karol:Client {name: 'Karol'}),
  (michal:Client {name: 'Mike'}),
  (anna:Client {name: 'Anna'}),
  (shoes:Product {name: 'Shoes', product_no: 100}),
  (coat:Product {name: 'Coat', product_no: 101}),
  (pants:Product {name: 'Pants', product_no: 102}),
  (jacket:Product {name: 'Jacket', product_no: 103}),
  (skirt:Product {name: 'Skirt', product_no: 104}),

  (karol)-[:BOUGHT {date: '2022-03-01', quantity: 1}]->(shoes),
  (karol)-[:BOUGHT {date: '2022-03-02', quantity: 1}]->(coat),
  (karol)-[:BOUGHT {date: '2022-03-04', quantity: 1}]->(pants),
  (mike)-[:BOUGHT {date: '2022-03-01', quantity: 1}]->(shoes),
  (mike)-[:BOUGHT {date: '2022-03-02', quantity: 1}]->(coat),
  (mike)-[:BOUGHT {date: '2022-05-11', quantity: 2}]->(jacket),
  (anna)-[:BOUGHT {date: '2022-05-14', quantity: 3}]->(jacket),
  (anna)-[:BOUGHT {date: '2022-04-20', quantity: 1}]->(skirt);

So Karol and Mike should be close. But if Anna and Mike bought a lot of jackets, they would be more similar than others.

If someone could help me how to create such query.

Regards

glilienfield · February 23, 2022, 4:43am

I have no idea if this make sense for your application or is an accurate metric to measure similarity, but throwing it out there as an idea. The query estimates the similarity between two customers by counting the number of times the two customers bought the same product.

If you have some classification for the products, you could alter it to count the number of times they bought products in the same category.

match(n:Client)
match pn = (n)-[:BOUGHT]->(p:Product)
with n, p, count(pn) as cn
match(m:Client)
where n<>m and id(n) < id(m)
match pm = (m)-[:BOUGHT]->(p)
with n, m, p, cn, count(pm) as cm
with n, m, p, apoc.coll.min([cn, cm]) as cnt
return n as cust1, m as cust2, count(cnt) as similarity

Gosforth · February 23, 2022, 8:00am

Thank you Gary, this could be some approach. But only takes into account number of common relations - does not include type of products.
I keep it in notebook, maybe useful for other projects.
Thanks

glilienfield · February 23, 2022, 2:45pm

I agree with you. It is fairly simple. I think the metric makes sense, but it would be a lot more accurate if you could have a more fuzzy similarity criteria between products instead of exact match, as used in the query.

Topic		Replies	Views
How is the SIMILARITY relatonship used in this code? Cypher	1	271	May 7, 2020
Finding similarity based on relationships and properties Cypher	3	229	January 28, 2024
Finding other people who bought the same products Newbie Questions	2	479	February 19, 2021
How to use Similarity ? GDS (2.6.5) in neo4j desktop (5.15.0) Graph Data Science / Graph Analytics cypher	2	257	April 29, 2024
Cypher: Match nodes that share the exact same relationships Neo4j Graph Platform migrated	2	151	January 5, 2023

Take the Course Then Join The Aura Agent Hackathon

How to find similar clients?

Related topics

Take the Course Then Join
The Aura Agent Hackathon