Eigenvector centrality returns the same value for all the nodes

I am trying to compute eigenvector centrality on a very large graph. However, since my graph is bigger than it can fit in memory for the required projection (see this discussion for context), I am down-sampling it for the projection before running the algorithm.

The following is the queries I'm using. The query returns an identical value (2.761963394277773E-4) for all the nodes, so I'm not sure if there is anything wrong with the query or if this is expected, given how small my downsampling is.

  • Creating the projection:
CALL gds.graph.project.cypher(
    'downsampledGraph',
    'MATCH (n:Person) WHERE rand() < 0.01 RETURN id(n) AS id',
    'MATCH (n:Person)-[r:Knows]->(m:Person) WHERE rand() < 0.01 RETURN id(n) AS source, id(m) AS target'
) YIELD graphName;
  • Running the algorithm:
CALL apoc.export.csv.query(
  "CALL gds.eigenvector.stream('downsampledGraph', {maxIterations:20, tolerance:0.0001})
   YIELD nodeId, score
   RETURN id(gds.util.asNode(nodeId)) AS personId, score
   ORDER BY score DESC",
  "file:///eigenvector_results.csv",
  {}
) YIELD file
RETURN file;

Here is the head of the eigenvector_result.csv file:

"personId","score"
"169000","2.761963394277773E-4"
"169018","2.761963394277773E-4"
"170962","2.761963394277773E-4"
"172489","2.761963394277773E-4"
"172502","2.761963394277773E-4"
"173127","2.761963394277773E-4"
"173142","2.761963394277773E-4"
"173171","2.761963394277773E-4"
"176204","2.761963394277773E-4"

Hi @hamed.metalgear,

I believe your situation can be explained from the fact that you are doing two sampling queries which are independent of the other. This means that the relationships of the second query might not necessarily appear as nodes in the first query. You can verify this by
doing and CALL gds.graph.list(downsampledGraph) YIELD * to see the rel. count. I expect it to be equal to zero.

I'd suggest you try the following

MATCH (source:Person)-[r:KNOWS]->(target:Person)

WHERE rand() < 0.01

WITH gds.graph.project('downsampledGraph', source, target) AS g

RETURN

g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

This should hopefully give you better results.

Best regards,
Ioannis.

Good point, thanks. That indeed resulted in returning more reasonable values.

Do you know if GDS or Neo4j offer any method of fixing the random number seed so that the output is reproducible?

Hi again @hamed.metalgear,

Good to hear it has helped you! I suppose you mean to ensure that every time you project a new graph the same relationship are picked, right?

In that case, a simple solution would be to pre-process your data and create a fixed random value for each relationship as follows:

MATCH ()-[R:KNOWS]->() SET R.prop = rand()

Then, all you have to do is replace rand() < 0.01 with r.prop < 0.01 in the above query.

Best,
Ioannis.

1 Like

I hope there was a runtime mechanism, but adding that prop is a good hack! :)