How to find all combinations of two nodes that both have a relation to more than a certain amount of nodes

(Thomas) #1

Hi everyone,

I'm currently starting to learn how Cypher works and I got stuck trying to figure out a (for a beginner) more complex Cypher.

The database has nodes for Customers, Purchases, Products and Stores, with the following relations:

There are a few millions purchases made by a few thousand people with about 20 thousand different products.

What I'm trying to do is find all combinations of two customers that have purchased at least 3 of the same products.
I have googled around a lot, but since I'm new to Cypher I can't find the right terms to search for to find similar queries that could help me out.

I did find this other topic that deals with the same kind of query, but the amount of nodes is a lot smaller and it has one less relationship. That query is about "Person -> Movie <- Person" with pairs, so the equivalent would be something like "Person -> Movie -> Genre <- Movie <- Person" and then finding the people that acted in at least 3 of the same genres.

I started with trying the following:

MATCH (c1:Customer)<-[:MADE_BY]-(p1:Purchase)-[:CONSISTS_OFF]->(product:Product)<-[:CONSISTS_OFF]-(p2:Purchase)-[:MADE_BY]->(c2:Customer)
WHERE id(c1) < id(c2)
WITH c1, c2, collect(product) as commonProducts
WHERE size(commonProducts) >= 3
RETURN c1, c2, commonProducts

but prepending it with EXPLAIN gives the following:

Which are a lot of rows (200 million+). I'm not seeing how I can optimize it and I couldn't find the right info to do so myself.

Could anyone help out or point me in the right direction so I can learn how to do this?

Thank you very much.