How to remove connected components less than x nodes?

Hi Maxime,

Yes GDS is still installed. A nother approach would be for each node, assoicated a network componentID and the number of nodes associated to this Component. Do you think that would be possible to do that ?

Best regards

I'm sorry but I already put you the two ways on a previous message and both requests are working on my local database and I use the same labels and properties as yours :confused:

I don't know what to try anymore, maybe create a completely new database and retry. It was working on your database and now it doesn't...

The first query works but not for a large dataset. I will continue to seach why the second approach doesn't work. Anyway, I would like to thank you very much for your precious help.
I will keep you posted if I find a solution ;-)

1 Like

Hi Maxime,

Good news, I found the source if the issue : It was my csv headers who were not correct.
I fixed that but the query is still very, very slow. I started the query yesterday night and this morning, it was still running. I will create a new thread for this point with details ;-)

Have a great day !

Oh nice @1113 :smile:

Even with the GDS query it's slow?
Did you use UNIQUE CONSTRAINTS and change the query to use this unique constraint?

I applied a unique constraint :

CREATE CONSTRAINT ON(l:Entity) ASSERT l.EntityID IS UNIQUE

But not 100% sure that is correctly reflected in the query :

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.EntityID IN $nodes_list
    RETURN n
    ', '
    SET n.community_id = $community_id
    ', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1

I use just one LABEL : Entity.
EntityID is unique it is the ID of the LABEL

Here's my csv for Nodes :
EntityID:ID,description,:LABEL
232ec2ce7ea347258eb640c345322173,Item1,Entity

And the csv for Edges :
Source:START_ID,Target:END_ID,:TYPE
e53628fb3f414cbc9eb2546cedc70645,34c073e8781244bdb934c3539cdf1674,IRW

Yeah, the query is good :slight_smile:

The only option I see now is to increase the power (RAM, CPU) of the database :confused:

But at least, you have two queries that works on smaller database :slight_smile:

Regards,
Cobra

Hi Cobra,

Yes, and thank you once again for your help !

Best regards !

Hi Cobra, I have reduced the number of nodes, increased the ressources on the machine and I got the query achieved in less than 2 hours :slight_smile:
That's really cool !
All the best!

1 Like

Hi Cobra,

I would need to identify each network components. I use this query to segment the component network per size :

CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1

This part works like a charm.

I tried to do add a uuid with apoc.create.uuid() but I did something wrong because the uuid id is defined for all components with the same size. Here's the query that I run :

WITH n.community_id as p, collect(n) as nodes
WITH p, nodes, apoc.create.uuid() as uuid
FOREACH (n in nodes | SET n.uuid = uuid)

Do you know how could I have one uuid per network component. So every network components (even if they have the same size will have a different uuid) ?

Thanks in advance !

Hello @1113 :slight_smile:

You could just use the same query but with a little modification:

CALL gds.wcc.stream({
  nodeProjection: "Item",
  relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.id IN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

Regards,
Cobra

Thank you, it looks very promising.
The query runs successfully (a lot of 1 returned) but it seems the variable uuid is not set :
After the query, uuid doesn't appear in the property Keys.

After the query, running MATCH (n) RETURN n LIMIT :
{"EntityID":"80f99c52240f432fbe396b091dedb0d6","community_id":6,"Description":"Entity1"}
...

I modified a bit the query to match my schema (renaming nodeProjection and relationshipProjection)

Bellow, the query that I run.
I would say that the variable $nodes_list is empty but I'm really not sure. Any idea ?

Thanks in advance !

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.id IN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

You didn't replace the id property by EntityID:

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.EntityIDIN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

Brilliant ! It works like a charm ! Many thanks ! :slight_smile: