Hi Maxime,
Yes GDS is still installed. A nother approach would be for each node, assoicated a network componentID and the number of nodes associated to this Component. Do you think that would be possible to do that ?
Best regards
Hi Maxime,
Yes GDS is still installed. A nother approach would be for each node, assoicated a network componentID and the number of nodes associated to this Component. Do you think that would be possible to do that ?
Best regards
I'm sorry but I already put you the two ways on a previous message and both requests are working on my local database and I use the same labels and properties as yours
I don't know what to try anymore, maybe create a completely new database and retry. It was working on your database and now it doesn't...
The first query works but not for a large dataset. I will continue to seach why the second approach doesn't work. Anyway, I would like to thank you very much for your precious help.
I will keep you posted if I find a solution ;-)
Hi Maxime,
Good news, I found the source if the issue : It was my csv headers who were not correct.
I fixed that but the query is still very, very slow. I started the query yesterday night and this morning, it was still running. I will create a new thread for this point with details ;-)
Have a great day !
Oh nice @1113
Even with the GDS query it's slow?
Did you use UNIQUE CONSTRAINTS and change the query to use this unique constraint?
I applied a unique constraint :
CREATE CONSTRAINT ON(l:Entity) ASSERT l.EntityID IS UNIQUE
But not 100% sure that is correctly reflected in the query :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
I use just one LABEL : Entity.
EntityID is unique it is the ID of the LABEL
Here's my csv for Nodes :
EntityID:ID,description,:LABEL
232ec2ce7ea347258eb640c345322173,Item1,Entity
And the csv for Edges :
Source:START_ID,Target:END_ID,:TYPE
e53628fb3f414cbc9eb2546cedc70645,34c073e8781244bdb934c3539cdf1674,IRW
Yeah, the query is good
The only option I see now is to increase the power (RAM, CPU) of the database
But at least, you have two queries that works on smaller database
Regards,
Cobra
Hi Cobra,
Yes, and thank you once again for your help !
Best regards !
Hi Cobra, I have reduced the number of nodes, increased the ressources on the machine and I got the query achieved in less than 2 hours
That's really cool !
All the best!
Hi Cobra,
I would need to identify each network components. I use this query to segment the component network per size :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
This part works like a charm.
I tried to do add a uuid with apoc.create.uuid() but I did something wrong because the uuid id is defined for all components with the same size. Here's the query that I run :
WITH n.community_id as p, collect(n) as nodes
WITH p, nodes, apoc.create.uuid() as uuid
FOREACH (n in nodes | SET n.uuid = uuid)
Do you know how could I have one uuid per network component. So every network components (even if they have the same size will have a different uuid) ?
Thanks in advance !
Hello @1113
You could just use the same query but with a little modification:
CALL gds.wcc.stream({
nodeProjection: "Item",
relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
Regards,
Cobra
Thank you, it looks very promising.
The query runs successfully (a lot of 1 returned) but it seems the variable uuid is not set :
After the query, uuid doesn't appear in the property Keys.
After the query, running MATCH (n) RETURN n LIMIT :
{"EntityID":"80f99c52240f432fbe396b091dedb0d6","community_id":6,"Description":"Entity1"}
...
I modified a bit the query to match my schema (renaming nodeProjection and relationshipProjection)
Bellow, the query that I run.
I would say that the variable $nodes_list is empty but I'm really not sure. Any idea ?
Thanks in advance !
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
You didn't replace the id
property by EntityID
:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityIDIN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
Brilliant ! It works like a charm ! Many thanks !