Eigenvector centrality returns the same value for all the nodes

hamed.metalgear · March 17, 2025, 5:50pm

I am trying to compute eigenvector centrality on a very large graph. However, since my graph is bigger than it can fit in memory for the required projection (see this discussion for context), I am down-sampling it for the projection before running the algorithm.

The following is the queries I'm using. The query returns an identical value (2.761963394277773E-4) for all the nodes, so I'm not sure if there is anything wrong with the query or if this is expected, given how small my downsampling is.

Creating the projection:

CALL gds.graph.project.cypher(
    'downsampledGraph',
    'MATCH (n:Person) WHERE rand() < 0.01 RETURN id(n) AS id',
    'MATCH (n:Person)-[r:Knows]->(m:Person) WHERE rand() < 0.01 RETURN id(n) AS source, id(m) AS target'
) YIELD graphName;

Running the algorithm:

CALL apoc.export.csv.query(
  "CALL gds.eigenvector.stream('downsampledGraph', {maxIterations:20, tolerance:0.0001})
   YIELD nodeId, score
   RETURN id(gds.util.asNode(nodeId)) AS personId, score
   ORDER BY score DESC",
  "file:///eigenvector_results.csv",
  {}
) YIELD file
RETURN file;

Here is the head of the eigenvector_result.csv file:

"personId","score"
"169000","2.761963394277773E-4"
"169018","2.761963394277773E-4"
"170962","2.761963394277773E-4"
"172489","2.761963394277773E-4"
"172502","2.761963394277773E-4"
"173127","2.761963394277773E-4"
"173142","2.761963394277773E-4"
"173171","2.761963394277773E-4"
"176204","2.761963394277773E-4"

ioannis_panagio · March 18, 2025, 9:22am

Hi @hamed.metalgear,

I believe your situation can be explained from the fact that you are doing two sampling queries which are independent of the other. This means that the relationships of the second query might not necessarily appear as nodes in the first query. You can verify this by
doing and CALL gds.graph.list(downsampledGraph) YIELD * to see the rel. count. I expect it to be equal to zero.

I'd suggest you try the following

MATCH (source:Person)-[r:KNOWS]->(target:Person)

WHERE rand() < 0.01

WITH gds.graph.project('downsampledGraph', source, target) AS g

RETURN

g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

This should hopefully give you better results.

Best regards,
Ioannis.

hamed.metalgear · March 19, 2025, 12:27am

Good point, thanks. That indeed resulted in returning more reasonable values.

Do you know if GDS or Neo4j offer any method of fixing the random number seed so that the output is reproducible?

ioannis_panagio · March 19, 2025, 8:12am

Hi again @hamed.metalgear,

Good to hear it has helped you! I suppose you mean to ensure that every time you project a new graph the same relationship are picked, right?

In that case, a simple solution would be to pre-process your data and create a fixed random value for each relationship as follows:

MATCH ()-[R:KNOWS]->() SET R.prop = rand()

Then, all you have to do is replace rand() < 0.01 with r.prop < 0.01 in the above query.

Best,
Ioannis.

hamed.metalgear · March 19, 2025, 12:30pm

I hope there was a runtime mechanism, but adding that prop is a good hack! :)

Topic		Replies	Views
GDS algorithms without a projection Graph Algorithms/Graph Data Science performance , cypher	5	62	March 17, 2025
Eigen centrality algorithm based on CYPHER projection and with weighted edges Graph Algorithms/Graph Data Science	0	616	June 15, 2020
Does Eigenvector centrality reduce the score of the node that has a large number of outgoing nodes? Newbie Questions	3	666	December 20, 2019
Eigenvector centrality failing Graph Algorithms/Graph Data Science	2	243	May 13, 2022
Eigenvector centrality returning null value scores Neo4j Graph Platform migrated	0	58	October 26, 2022

Eigenvector centrality returns the same value for all the nodes

Related topics