Seeding componentIds in WCC algorithm with integers other than neo ids

Hello,

Here is what I ran:

CREATE (nAlice:User {name: 'Alice'})
CREATE (nBridget:User {name: 'Bridget'})
CREATE (nCharles:User {name: 'Charles'})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})
MATCH (b:User {name: 'Bridget'})

CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles)
CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 2}]->(nMichael)
CREATE (b)-[:LINK {weight: 2.0}]->(new:User {name: 'Mats'})

Since this is a small graph, I can computed the weakly connected components of the graph with links above the threshold of 1 and they are {Alice, Charles}; {Bridget}; {Doug, Mark, Michael} - these are the same as in the documentation https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/?_ga=2.182325668.283610538.1591812424-1500958359.1578494766&_gac=1.216262306.1589235117.EAIaIQobChMIm6nDkuqs6QIVh6_ICh3SagbMEAAYASAAEgIJJvD_BwE#algorithms-wcc-examples-seeding. I choose my own integer values of 0, 100 and 300 for these components and assigned them as node property "componentId" :

MATCH (n:User) WITH id(n) AS  nodeId, n.componentId AS componentId 
RETURN gds.util.asNode(nodeId).name AS Name, componentId ORDER BY componentId
Name componentId
"Alice" 0
"Charles" 0
"Bridget" 100
"Doug" 300
"Mark" 300
"Michael" 300

When I run the wcc algorithm, the algorithm overrides the seeding property:

CALL gds.wcc.stream({
  nodeProjection: 'User',
  relationshipProjection: 'LINK',
  relationshipWeightProperty: 'weight',
  relationshipProperties: ['weight', 'componentId'],
  threshold: 1.0}
) YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId ORDER BY componentId
Name componentId
"Alice" 0
"Charles" 0
"Bridget" 1
"Doug" 3
"Mark" 3
"Michael" 3

Lesson learnt: The algorithm will not override the user-defined seeded value only if the seed values are the minimum value of the component nodeId s of the nodes contained in it where nodeId is the neo4j id of the respective nodes in the component.

If the above is the case, it would be good to record this in the documentation. If not, I would like to know how to coerce user-defined seed integer values.

Hope you find this suggestion useful.

Best,
Lavanya

In order to use seeding, you must specify a seed property in your algorithm call -- which your above query doesn't do. You're missing the seedProperty specification, and you're loading componentID as a relationship property, when it should be a node property. Please see the docs for a step by step example of how to use seeding: https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/#algorithms-wcc-examples-seeding

Assuming you've run WCC once, and have component IDs already in your graph, to use the existing components as seeds, you'll need to use the following call:

CALL gds.wcc.stream({
  nodeProjection: 'User',
  relationshipProjection: 'LINK',
  nodeProperties: 'componentID'
  relationshipProperties: 'weight',
  relationshipWeightProperty: 'weight',
  seedProperty:  'componentId',
  threshold: 1.0}
) YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId ORDER BY componentId

@alicia.frame Thanks! Specifying the "seedProperty" does coerce user-defined seeds indeed!

Can you also comment on the default behaviour of the seeds? - when the WCC algorithm first runs in the absence of any "seedProperty" on the nodes, I think the algorithm (in this case) assigns the minimum node id among the nodes in the component as the componentId.

Thanks,
Lavanya

From the docs: https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/

Seeded components

It is possible to define preliminary component IDs for nodes using the seedProperty configuration parameter. This is helpful if we want to retain components from a previous run and it is known that no components have been split by removing relationships. The property value needs to be a number.

The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.

Once every node belongs to a component, the algorithm merges components of connected nodes. When components are merged, the resulting component is always the one with the lower component ID. Note that the consecutiveIds configuration option cannot be used in combination with seeding in order to retain the seeding values.

The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended to run WCC without seeds.