I'm currently starting to learn how Cypher works and I got stuck trying to figure out a (for a beginner) more complex Cypher.
The database has nodes for Customers, Purchases, Products and Stores, with the following relations:
There are a few millions purchases made by a few thousand people with about 20 thousand different products.
What I'm trying to do is find all combinations of two customers that have purchased at least 3 of the same products.
I have googled around a lot, but since I'm new to Cypher I can't find the right terms to search for to find similar queries that could help me out.
I did find this other topic that deals with the same kind of query, but the amount of nodes is a lot smaller and it has one less relationship. That query is about "Person -> Movie <- Person" with pairs, so the equivalent would be something like "Person -> Movie -> Genre <- Movie <- Person" and then finding the people that acted in at least 3 of the same genres.
I started with trying the following:
WHERE id(c1) < id(c2)
WITH c1, c2, collect(product) as commonProducts
WHERE size(commonProducts) >= 3
RETURN c1, c2, commonProducts
but prepending it with EXPLAIN gives the following:
Which are a lot of rows (200 million+). I'm not seeing how I can optimize it and I couldn't find the right info to do so myself.
Could anyone help out or point me in the right direction so I can learn how to do this?
Thank you very much.