Which GDS algorithms to use to create customer clusters?

Gosforth · December 8, 2023, 1:15pm

I'm wondering how I can use Neo4j to divide customers into clusters depending on their purchases and then see later which clusters buy what most often (with the help of GDS).

My test graph:

//Client:
CREATE (nMike:Client {name:'Mike'})

//Product:
CREATE (nMilk:Product {product_name:'Milk', product_id: 100})

//Relations:
CREATE (nMike)-[:BOUGHT {quantity: 2, purchase_date: '2023-12-09'}]->(nMilk)
CREATE (nMike)-[:BOUGHT {quantity: 5, purchase_date: '2023-11-12'}]->(nMilk)

So there are so many links from Client to Product as number of purchases.
I'm guessing the Louvain algorithm will be appropriate for making clusters of customers with similar purchases?

//Graf
CALL gds.graph.project(
  'graph',
  ['Client', 'Product'],
  ['BOUGHT'],
  {nodeProperties: 'product_id'}
)

//Query:
CALL gds.louvain.stream('graph')
 YIELD nodeId, communityId, intermediateCommunityIds 
 WHERE gds.util.asNode(nodeId).name is not null
 RETURN gds.util.asNode(nodeId).name AS name, communityId 
 ORDER BY name ASC

Do I understand correctly that the number of product relations will not be taken into account and the results (cluster to which client belongs to) will be influenced by the type of product to which these relations are ('product_id')?

I found this https://www.youtube.com/watch?v=ziG_oPnxB20&t=2183s what is close what I want to do. I wonder why Alicia first run nodeSimilarity before she goes to Louvain algorithm when checking out how many clusters clients form? This is is about the cost of such query (performance)?

Regards,

G.

Gosforth · December 13, 2023, 12:24pm

Hmmm... no-one?
I see Louvain algorithm willtake into account the number of relationships same type between nodes and show different results if this number is different. It also ignores the properties of related nodes (it doesn't matter what product the customer is associated with). Therefore, it is completely unsuitable in the situation if there is more than one relationship to the same node (if we have more than one BOUGHT relationship to the same product).
Is there any way to eliminate duplicate relations of the same type when creating a graph (gds.graph.project) and take into account the properties of nodes (it is important not only that the customer bought X times, but also WHAT he bought)?

Will the nodeSimilarity algorithm ignore relationships of the same type to the same product (even if there are several relationships to product X, only include one). And most importantly; does he take into account the properties of nodes or only the relations themselves?

Topic		Replies	Views
How advantageous is graph algos in comparison to regular ml algos? Graph Algorithms/Graph Data Science	1	264	September 20, 2023
Community detection on bipartite graph Graph Algorithms/Graph Data Science	3	811	November 1, 2023
Louvain algorithm: visualising communities Visualization	2	424	August 20, 2021
Applying Louvain algorithm on a Bipartite Graph Graph Algorithms/Graph Data Science	7	645	August 21, 2023
Searching for a query to draw GDS clusters results Neo4j Graph Platform cypher , neo4j-desktop	1	307	March 25, 2022

Which GDS algorithms to use to create customer clusters?

Related topics