How to remove connected components less than x nodes?

1113 · September 4, 2020, 12:52pm

Hi Maxime,

Yes GDS is still installed. A nother approach would be for each node, assoicated a network componentID and the number of nodes associated to this Component. Do you think that would be possible to do that ?

Best regards

cobra · September 4, 2020, 1:43pm

I'm sorry but I already put you the two ways on a previous message and both requests are working on my local database and I use the same labels and properties as yours

I don't know what to try anymore, maybe create a completely new database and retry. It was working on your database and now it doesn't...

1113 · September 4, 2020, 2:24pm

The first query works but not for a large dataset. I will continue to seach why the second approach doesn't work. Anyway, I would like to thank you very much for your precious help.
I will keep you posted if I find a solution ;-)

1113 · September 9, 2020, 10:09am

Hi Maxime,

Good news, I found the source if the issue : It was my csv headers who were not correct.
I fixed that but the query is still very, very slow. I started the query yesterday night and this morning, it was still running. I will create a new thread for this point with details ;-)

Have a great day !

cobra · September 9, 2020, 10:17am

Oh nice @1113

Even with the GDS query it's slow?
Did you use UNIQUE CONSTRAINTS and change the query to use this unique constraint?

1113 · September 9, 2020, 11:21am

I applied a unique constraint :

CREATE CONSTRAINT ON(l:Entity) ASSERT l.EntityID IS UNIQUE

But not 100% sure that is correctly reflected in the query :

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.EntityID IN $nodes_list
    RETURN n
    ', '
    SET n.community_id = $community_id
    ', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1

I use just one LABEL : Entity.
EntityID is unique it is the ID of the LABEL

Here's my csv for Nodes :
EntityID:ID,description,:LABEL
232ec2ce7ea347258eb640c345322173,Item1,Entity

And the csv for Edges :
Source:START_ID,Target:END_ID,:TYPE
e53628fb3f414cbc9eb2546cedc70645,34c073e8781244bdb934c3539cdf1674,IRW

cobra · September 9, 2020, 4:41pm

Yeah, the query is good

The only option I see now is to increase the power (RAM, CPU) of the database

But at least, you have two queries that works on smaller database

Regards,
Cobra

1113 · September 10, 2020, 8:17am

Hi Cobra,

Yes, and thank you once again for your help !

Best regards !

1113 · September 10, 2020, 10:32pm

Hi Cobra, I have reduced the number of nodes, increased the ressources on the machine and I got the query achieved in less than 2 hours
That's really cool !
All the best!

1113 · December 14, 2020, 11:33am

Hi Cobra,

I would need to identify each network components. I use this query to segment the component network per size :

CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1

This part works like a charm.

I tried to do add a uuid with apoc.create.uuid() but I did something wrong because the uuid id is defined for all components with the same size. Here's the query that I run :

WITH n.community_id as p, collect(n) as nodes
WITH p, nodes, apoc.create.uuid() as uuid
FOREACH (n in nodes | SET n.uuid = uuid)

Do you know how could I have one uuid per network component. So every network components (even if they have the same size will have a different uuid) ?

Thanks in advance !

cobra · December 14, 2020, 8:50pm

Hello @1113

You could just use the same query but with a little modification:

CALL gds.wcc.stream({
  nodeProjection: "Item",
  relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.id IN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

Regards,
Cobra

1113 · December 15, 2020, 12:08am

Thank you, it looks very promising.
The query runs successfully (a lot of 1 returned) but it seems the variable uuid is not set :
After the query, uuid doesn't appear in the property Keys.

After the query, running MATCH (n) RETURN n LIMIT :
{"EntityID":"80f99c52240f432fbe396b091dedb0d6","community_id":6,"Description":"Entity1"}
...

I modified a bit the query to match my schema (renaming nodeProjection and relationshipProjection)

Bellow, the query that I run.
I would say that the variable $nodes_list is empty but I'm really not sure. Any idea ?

Thanks in advance !

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.id IN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

cobra · December 15, 2020, 8:30am

You didn't replace the id property by EntityID:

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
  MATCH (n)
  WHERE n.EntityIDIN $nodes_list
  RETURN n
  ', '
  SET n.uuid = $uuid
  ', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1

1113 · December 15, 2020, 9:10am

Brilliant ! It works like a charm ! Many thanks !

Topic		Replies	Views
Deleting a subgraph Neo4j Graph Platform migrated	4	133	June 18, 2022
Why apoc.export.cypher.query removes edges from original data? Procedures & APOC	1	197	March 19, 2022
Data Deletion Neo4j Graph Platform migrated	4	220	November 9, 2022
Problems with clustering (GDS) and APOC queries Procedures & APOC apoc , cypher	6	275	February 26, 2022
Delete a subgraph from a database Cypher cypher	10	468	May 3, 2022

How to remove connected components less than x nodes?

Related topics