Super nodes performance issue while running community detection algorithms

mlo · October 22, 2018, 5:40pm

I run community detection algorithms(unionFind, Louvain) to partition my graph database.

However, I recently encountered some performance problem because of super nodes.

I have the linking structure like below.

(:User)-[:DEVICE]->(:Device)

and other similar structures.

Basically, we want to use the property node(like device node) the users shared to link them.

Majority(99%) of the property nodes are only linked to ONE user.

However, some extreme ones link to 10K+ users which form super nodes.

Any suggested way to solve the super node performance issue?

michael.hunger · October 22, 2018, 10:30pm

Can you share your concrete model?
Not sure if you always need property nodes.

Which procedures did you run the one in graph algorithms?

The ones in APOC are deprecated and shouldn't be used anymore.

mlo · October 22, 2018, 11:01pm

(:User)-[:DEVICE]->(:Device)
(:User)-[:PHONE]->(:Phone)
(:User)-[:EMAIL]->(:Email)
(:User)-[:SHIPPED]->(:Address)

Basically, I use these four models to link users.

I ran algo.unionFind and algo.louvain to partition the graph.

eg.

CALL algo.unionFind(
'MATCH (u1:User)
RETURN id(u1) as id',
'MATCH (u1:User)-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]->(middle)<-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]-(u2:User)
WHERE id(u1) < id(u2)
RETURN id(u1) as source, id(u2) as target',
{graph:'cypher', write: true, partitionProperty: 'group_label', concurrency: 16}
)

mark.needham · October 24, 2018, 10:30am

Hi,

When you say performance issue do you mean that the algorithm isn't returning a result or it's taking longer than you expect or something else? If it's slow do you know where exactly the problem is occurring?

Is it in the running of the algorithm or in the Cypher projection? How long does it take to run this query:

MATCH (u1:User)-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]->(middle)<-[:DEVICE|:EMAIL|:PHONE|:SHIPPED]-(u2:User)
WHERE id(u1) < id(u2)
RETURN count(*)

mehdi.ajroud · December 4, 2018, 2:53pm

Is there any news about this topic ? I am having the same issue ! or sall I create a new issue ?

mlo · December 4, 2018, 7:06pm

I changed the way I model the data to avoid supernodes.

@mehdi.ajroud, you should create a new issue on that. If they can solve the supernode issue in community detection algorithm that will be wonderful.

mehdi.ajroud · December 5, 2018, 9:05am

ok ! I will recreate it since I didn't find any explanation

mehdi.ajroud · December 5, 2018, 11:08am

@mlo here is the link where I created the new issue , in case you want to follow the answer that I will get ;)

michael.hunger · December 12, 2018, 11:09pm

Better use label propagation or louvain, they are better at substructuring the graph.

mehdi.ajroud · December 13, 2018, 3:54pm

I will try those ones and I will let you know about the final results ! Thanks Micheal :)

benjamin.squire · April 1, 2019, 5:35am

did you get a result to this @mehdi.ajroud? would be interested to hear if these other methods worked better at avoiding supernodes

michael.hunger · April 4, 2019, 7:24am

@mio did you see Marks question?

You also didn't use distinct or count(*) in the edge-list statement, so you get a ton of duplicate pairs.

The answer to your question is to use USING JOIN ON d where it does a scan + expand on both sides and then a join on the middle node.

Topic		Replies	Views
After using unionFind I am getting a cluster with 10k node Neo4j Graph Platform apoc , cypher	0	807	December 5, 2018
Graph Modeling: All About Super Nodes Neo4j Developer Blog Archive	1	909	December 28, 2020
Using Graph Algorithms to Detect Supernodes Graph Algorithms/Graph Data Science apoc	2	904	March 16, 2023
Refactoring graph model to overcome super node problem Neo4j Graph Platform migrated	1	132	August 23, 2022
Traversing that involve super node Cypher	7	532	September 27, 2021

July Summer Fun!

Super nodes performance issue while running community detection algorithms

Related topics