I'm wondering how I can use Neo4j to divide customers into clusters depending on their purchases and then see later which clusters buy what most often (with the help of GDS).
My test graph:
//Client:
CREATE (nMike:Client {name:'Mike'})
//Product:
CREATE (nMilk:Product {product_name:'Milk', product_id: 100})
//Relations:
CREATE (nMike)-[:BOUGHT {quantity: 2, purchase_date: '2023-12-09'}]->(nMilk)
CREATE (nMike)-[:BOUGHT {quantity: 5, purchase_date: '2023-11-12'}]->(nMilk)
So there are so many links from Client to Product as number of purchases.
I'm guessing the Louvain algorithm will be appropriate for making clusters of customers with similar purchases?
//Graf
CALL gds.graph.project(
'graph',
['Client', 'Product'],
['BOUGHT'],
{nodeProperties: 'product_id'}
)
//Query:
CALL gds.louvain.stream('graph')
YIELD nodeId, communityId, intermediateCommunityIds
WHERE gds.util.asNode(nodeId).name is not null
RETURN gds.util.asNode(nodeId).name AS name, communityId
ORDER BY name ASC
Do I understand correctly that the number of product relations will not be taken into account and the results (cluster to which client belongs to) will be influenced by the type of product to which these relations are ('product_id')?
I found this https://www.youtube.com/watch?v=ziG_oPnxB20&t=2183s what is close what I want to do. I wonder why Alicia first run nodeSimilarity before she goes to Louvain algorithm when checking out how many clusters clients form? This is is about the cost of such query (performance)?
Regards,
G.