lablePropogation algorithm: keeps running(infinite loop?), how do I know the details of the subgraph


(Thrilok Nagamalla) #1

Hello All,

I am trying to build a clustering algorithm for products sold for one of the data sets at my work. To begin with, I have used the Northwind Graph to test out the code template.

The below picture gives an idea of the schema, size and complexity of example data base


image

The code used is as follows:

CALL algo.labelPropagation.stream(
"MATCH (p:Product) RETURN id(p) AS id",
"MATCH (p1:Product)<-[:ORDERS|PURCHASED*]-()-[:PURCHASED|ORDERS*]->(p2:Product)
WHERE id(p1) < id(p2)
RETURN id(p1) AS source, id(p2) AS target, count(*) as weight",
{graph: "cypher",iterations:4})

YIELD nodeId, label
MATCH (p:Product) WHERE id(p) = nodeId
MERGE (sp:SuperCategory {name: "SuperCategory-" + label})
MERGE (p)-[:IN_SUPER_CATEGORY]->(sp)
RETURN nodeId,p.productName, label

It can be seen that I am trying to find the cluster of similar products purchased by a customer in two or more different orders. The result is:

The products have been grouped into two super categories with labels 74 and 76.

I wanted to do the same to a database which has these numbers:

It is quite a complex data base and I used the following code:

MATCH (a:Account)-[:ENROLLED]->(:YearMonth{ym:201002})
CALL algo.labelPropagation.stream(
"MATCH (r:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a) RETURN id(r) AS id",
"MATCH (r1:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a)-[:ORDERED|CONTAINS|ISIN*]->(r2:Regimen)
WHERE id(r1) < id(r2)
RETURN id(r1) AS source, id(r2) AS target, count(*) as weight",
{graph: "cypher",iterations:2})
YIELD nodeId, label
with nodeId, label order by label
RETURN nodeId, label

The first match statement projects a very small portion of the big graph for the algorithm whose size I am not sure how to extract from CQL. But I have run the code for 3 hours and it is still running.

I have these questions:

  1. Is there a way to find the information of the sub-graph being used in algorithm, like the number and albels of nodes and relationships.
  2. What could be causing the code to run for so long? how do I check if its not running in an infinite loop?

database version: 3.4.9
graph algorithm version: 3.5.0.1
graph algorithm version: 3.4.8.0 (I had to sue a bigger desktop to run the second database which has a older version)

Thank you,
Thrilok