How to remove connected components less than x nodes?

Ha sorry, I didn't understand correctly. So "UniqEntity" is useless, I gonna remove it, and I will put a constraint on ID and relaunch the query.

Can we keep the same query or should it be updated ?

Best regards,

If the property name is id, you can use this one:

MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sortText(apoc.coll.toSet(collect(DISTINCT b.id) + [a.id])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WITH size, apoc.coll.flatten(collect(nodes_list)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.id IN $nodes_list
    SET n.community_id = $community_id
    ', '
    DETACH DELETE n
    ', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1

This requires more memory than the db size as Cypher needs to keep all these paths into memory. So, it can kick off garbage collection.

Have you tried the weakly connected components algo in GDS?

May be this can help identify the weekly connected components. Once you run you can determine smaller communities and delete them

1 Like

Oh I forget this one :see_no_evil:

Yeah it could work :slight_smile:

Thank you for the suggestion, I gonna try that.
Best regards !

Found that :

CALL gds.wcc.stream({
nodeProjection: "Library",
relationshipProjection: "DEPENDS_ON"
})
YIELD nodeId, componentId
RETURN componentId, collect(gds.util.asNode(nodeId).id) AS libraries
ORDER BY size(libraries) DESC;

would that be a good starting point ?

Yeah, good start! :slight_smile:

If you want to do everything in one time (maybe you have to change the nodeProjection and the relationship Projection). In my query, it will delete communities which have less than 6 nodes.

CALL gds.wcc.stream({
    nodeProjection: "Item",
    relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WHERE size < 6
WITH apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.id IN $nodes_list
    RETURN n
    ', '
    DETACH DELETE n
    ', {batchSize:1000, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1

If you want to do it in two times:

  • Save the size of the community in a property:
CALL gds.wcc.stream({
    nodeProjection: "Item",
    relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.id IN $nodes_list
    RETURN n
    ', '
    SET n.community_id = $community_id
    ', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
  • Next, to delete, for example connected components that have less than 6 nodes:
CALL apoc.periodic.iterate('MATCH (n) WHERE n.community_id < $community_id RETURN n', 'DETACH DELETE n', {batchSize:1000, params:{community_id:6}})

Regards,
Cobra

Hi,

I tried to delete all the nodes to do another import with the following command :

match (a) -[r] -> () delete a, r

And after a while, I got this error message :

Neo.DatabaseError.Transaction.TransactionCommitFailed

Makes me think to a db settings issue (Maybe to root cause of the issue with the query no eonding ?)
My settings :
dbms.directories.import=import
dbms.security.auth_enabled=true
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=2G
dbms.tx_state.memory_allocation=ON_HEAP
dbms.connector.bolt.enabled=true
dbms.connector.http.enabled=true
dbms.connector.https.enabled=false
dbms.security.procedures.unrestricted=apoc.*
dbms.jvm.additional=-XX:+UseG1GC
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
dbms.jvm.additional=-XX:+AlwaysPreTouch
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
dbms.jvm.additional=-XX:+DisableExplicitGC
dbms.jvm.additional=-XX:MaxInlineLevel=15
dbms.jvm.additional=-Djdk.nio.maxCachedBufferSize=262144
dbms.jvm.additional=-Dio.netty.tryReflectionSetAccessible=true
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
dbms.jvm.additional=-XX:FlightRecorderOptions=stackdepth=256
dbms.jvm.additional=-XX:+UnlockDiagnosticVMOptions
dbms.jvm.additional=-XX:+DebugNonSafepoints
dbms.windows_service_name=neo4j

Seems ok to you ?

Have a great day !

Tried another time and got :

Neo.DatabaseError.Statement.ExecutionFailed
Java heap space

To delete everything in the database, you should use:

CALL apoc.periodic.iterate('MATCH (n) RETURN n', 'DETACH DELETE n', {batchSize:1000})

Thank you !
BTW, I increased dbms.memory.heap.max_size to 16G and the delete query have been executed

It's another way :slight_smile: but it's always better to use the query I gave you :slight_smile:

Hi Cobra,

The Query is sucesseful but it seems no nodes are deleted. The query terminates very fast too :

Query used :


CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WHERE size < 16
WITH apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
    MATCH (n)
    WHERE n.id IN $nodes_list
    RETURN n
    ', '
    DETACH DELETE n
    ', {batchSize:1000, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1

Can you show me what is returned by:

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
RETURN *

Sure. Here's the result :

Can you show me your properties on the right please? And tell me the one which is unique please :slight_smile:

Sure. Here you go :

Here's the headers of my csv file :
Entity:ID,description:LABEL

Entity and ID is the same data . I added a unique constrainte on "Entity" even if I guess it is done automatically because Entity is used as ID.

Best regards

What is returned by:

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).Entity) AS libraries
WITH size(libraries) AS size, libraries
RETURN *

Same result :

Try this:

CALL gds.wcc.stream({
    nodeProjection: "Entity",
    relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId)) AS libraries
RETURN *