Difference between calling "algo.closeness.stream" and "algo.closeness" for large graphs

Hello,

I was able to execute the following commands with no issues for small graph of size 100000 nodes:

CALL algo.closeness.stream(
  'MATCH (n:alias) RETURN id(n) AS id',
  "MATCH (n)--(m:alias) RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher"})
YIELD nodeId, centrality
WITH algo.asNode(nodeId) AS node, centrality AS centrality_stream
SET node.centrality = centrality_stream
CALL algo.closeness(
  'MATCH (n:alias) RETURN id(n) AS id',
  "MATCH (n)--(m:alias) RETURN id(n) AS source, id(m) AS target", 
  {graph:'cypher', direction: 'BOTH', write:true, writeProperty:'closeness.centrality'})
YIELD nodes,loadMillis, computeMillis, writeMillis;

with the first command running slightly faster than the second. For larger graph, I understand I could stream the results out and use the apoc periodic iterate to write the data back to the db:

CALL apoc.periodic.iterate(
  "CALL algo.closeness.stream(
  'MATCH (n:alias) RETURN id(n) AS id',
  "MATCH (n)--(m:alias) RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher"})
  YIELD nodeId, centrality",
  "match (n) where id(n) = nodeId SET node.centrality = centrality", {batchSize:100}) 
YIELD batches, total, errorMessages

Can I do something similar with CALL algo.closeness, especially if the graph instance is in a server with multiple CPUs (if I manage to achieve getting multiple CPUs for the instance) or AWS? What is the recommendation?

Thanks,
Lavanya

Whether an algorithm will execute on a large graph is determined by how much memory you have available in heap, and how long it will take to execute is controlled by how many threads you have available (although CE users are limited to 4 cores).

The single biggest way to improve the performance of that query is to skip use of the Cypher loaders. Using Cypher projections is slow - and it's bottlenecked by the execution time for the Cypher queries themselves. Instead, add the relationship you want to use directly to your database, and run using huge graph: https://neo4j.com/docs/graph-algorithms/current/projected-graph-model/label-relationship-type-projection/

Yo do not need need to use apoc.periodicIterate; just run algo.closeness as you did in your first two examples and set the write property accordingly.

Hi @alicia.frame,

Thanks for pointing out the general rule of thumb to follow for extending the queries to huge graph. I am indeed writing the queries for smaller instances of graphs before deploying the codes for huge instances. Where possible, I will try to avoid cypher projections and use the relationships directly. For closeness centrality however, there is some confusion - see below.

I followed the recommendation to use cypher projections for computing closeness centrality for undirected, unweighted graph. Section 9.2.3.8 of Closeness Centrality - Neo4j Graph Data Science

Let's say we construct the following small graph:

MERGE (a:Node{id:"A"})
MERGE (b:Node{id:"B"})
MERGE (c:Node{id:"C"})
MERGE (d:Node{id:"D"})
MERGE (e:Node{id:"E"})

MERGE (a)-[:LINK]->(b)
MERGE (b)-[:LINK]->(c)
MERGE (d)-[:LINK]->(c)
MERGE (e)-[:LINK]->(d);

The following codes return the closeness centrality of the undirected, unweighted graph underlying the input graph instance:

CALL algo.closeness.stream( 'Node', 'LINK')
YIELD nodeId, centrality

RETURN algo.asNode(nodeId).id AS node, centrality

Issue with this usage: I manually checked that this code returns the results of closeness centrality of the undirected version. How do we modify this code for computing closeness centrality for directed, unweighted version?

CALL algo.closeness.stream('Node', 'LINK', {direction: 'OUTGOING'})
YIELD nodeId, centrality

returns the same result as that of the undirected version. Can we conclude that for directed, unweighed version, only usage with cypher projections is possible - this is contrary to what section 9.2.3.8. Graph type support says.

If I use cypher projections as per 9.2.3.8. Graph type support, the only code that returns the undirected version is

CALL algo.closeness.stream(
 'MATCH (n:Node) RETURN id(n) AS id',
  "MATCH (n)-[:LINK]-(m:Node) RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher"})
YIELD nodeId, centrality

RETURN algo.asNode(nodeId).id AS node, centrality

Issue with this usage : As reported here Running closeness centrality with or without directions, it is not clear what the results are if we accidentally modified third line of the above code to "MATCH (n)-[:LINK]->(m:Node) RETURN id(n) AS source, id(m) AS target".

Kindly let me know if using graph projections for undirected graph is indeed warranted for closeness centrality of nodes.

I have issue when I apply this code
CALL algo.closeness.stream('User', 'FOLLOWS')
YIELD nodeId, centrality
RETURN algo.asNode(nodeId).id AS User, centrality
ORDER BY centrality DESC
LIMIT 20;

by saying There is no procedure with the name algo.closeness.stream registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.
Even I had installed the Graph Data Science Library, but it keep showed me that error

Think you