Super nodes performance issue while running community detection algorithms


(Mlo) #1

I run community detection algorithms(unionFind, Louvain) to partition my graph database.

However, I recently encountered some performance problem because of super nodes.

I have the linking structure like below.

(:User)-[:DEVICE]->(:Device)

and other similar structures.

Basically, we want to use the property node(like device node) the users shared to link them.

Majority(99%) of the property nodes are only linked to ONE user.

However, some extreme ones link to 10K+ users which form super nodes.

Any suggested way to solve the super node performance issue?


(Michael Hunger) #2

Can you share your concrete model?
Not sure if you always need property nodes.

Which procedures did you run the one in graph algorithms?

The ones in APOC are deprecated and shouldn't be used anymore.


(Mlo) #3

(:User)-[:DEVICE]->(:Device)
(:User)-[:PHONE]->(:Phone)
(:User)-[:EMAIL]->(:Email)
(:User)-[:SHIPPED]->(:Address)

Basically, I use these four models to link users.

I ran algo.unionFind and algo.louvain to partition the graph.

eg.

CALL algo.unionFind(
'MATCH (u1:User)
RETURN id(u1) as id',
'MATCH (u1:User)-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]->(middle)<-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]-(u2:User)
WHERE id(u1) < id(u2)
RETURN id(u1) as source, id(u2) as target',
{graph:'cypher', write: true, partitionProperty: 'group_label', concurrency: 16}
)


(Mark Needham) #4

Hi,

When you say performance issue do you mean that the algorithm isn't returning a result or it's taking longer than you expect or something else? If it's slow do you know where exactly the problem is occurring?

Is it in the running of the algorithm or in the Cypher projection? How long does it take to run this query:

MATCH (u1:User)-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]->(middle)<-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]-(u2:User)
WHERE id(u1) < id(u2)
RETURN count(*)