Running closeness centrality for a subgraph instance and for a subset of nodes

lavanya_kannan · January 16, 2020, 5:10pm

Hello,

I want to subset my graph instance and get closeness centrality output for just one specific type of entity. Is there any way to optimize my query?

I am running the following cypher query:

CALL algo.closeness.stream('MATCH (p) WHERE ANY(lbl IN ["PERSON", "CITATION", "BIOTERM"] WHERE lbl IN LABELS(p)) RETURN id(p) as id',
'MATCH (p1)-[r]->(p2) RETURN id(p1) as source,id(p2) as target',
{graph:'cypher'})
YIELD nodeId, centrality

RETURN labels(algo.asNode(nodeId)) AS nodeType, nodeId AS nodeId, centrality LIMIT 100;

I think I am able to subset my graph using the cypher query in the first argument of the CALL algo.closeness.stream. I get the centrality scores of all the three type of nodes "PERSON", "CITATION" and "BIOTERM". I know that I can modify my RETURN statement to limit output only for my "PERSON". But I am wondering if there is a way to optimize the algorithm so it computes and fetches the output for "PERSON" nodes.

In other words, I want the algorithm to find shortest paths in my graph induced by the "PERSON", "CITATION" and "BIOTERM" nodes, but computes the centrality measure only for the "PERSON" nodes.

Thanks!

ganesanmithun323 · January 17, 2020, 6:39am

first, it will compute for all the nodes which are passed in cypher statement one. so , its not possible to have returned only person nodes .
second, in general , closeness centrality will be computed for similar/same node types .

But you can do one thing in this case , you can use the relations to the all nodes from person node and create a new relation between persons using jaccard similarity and you can store the similarity score in the relation itself ,

MATCH (person:PERSON)-[]->(node)
WHERE node:CITATION OR node:BIOTERM OR node:PERSON
WITH {item:id(person)  , categories:COLLECT(id(node))} as userData 
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {topK: 15, similarityCutoff: 0.1, write:true})

more about jaccard similarity : Similarity functions - Neo4j Graph Data Science

and after that , you can run the closeness centrality on the newly formed relations

CALL algo.closeness.stream('PERSON', 'SIMILAR')
YIELD nodeId, centrality

RETURN algo.asNode(nodeId).name AS person, centrality
ORDER BY centrality DESC
LIMIT 20;

explanation :
Here i first connected the person nodes based on their similarity , then tried to compute the closeness centrality . If a person node is similar to many person nodes based on his relations , then he is likely to have high centrality measure .
Say, if a person is connected to too many CITATION and BIOTERM nodes, but no other person is connected to these CITATION or BIOTERM nodes, then this person will have less centrality score, since he wont have similarity to any other person node . Likely, if a person has less CITATION or BIOTERM, but many persons are connected to same CITATION or BIOTERM , this person will have high centrality score.

But in the approach you are following, the person who has too many CITATION , BIOTERM nodes can get high centrality score ..because the person is at a close distance to too many nodes (thing to note: person is connected closely to nodes of different type but not person node .. that is why its ideal to compute closeness centrality on graph with same type of nodes)

also , you can try out different similarity measure based on your use case .

alicia.frame · January 20, 2020, 6:02pm

Cypher projections work by first identifying the set of nodes you want to use (first clause), and then defining the relationships you want to consider (second clause). So you can use the projection to subset or filter your input graph.

In order to just consider person nodes for your closeness centrality calculation, you'll want to create a monopartite person to person projection. For example, something like:

CALL algo.closeness.stream(
'MATCH (p:PERSON) RETURN id(p) as id',
'MATCH (p1:Person)-[r]-(p2:Person) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher'})
YIELD nodeId, centrality

(if you don't have one hop relationships between people, you may need something like MATCH (p1:Person)-->()<--(p2:Person) in your second query, or you could use similarity to create new person to person relationships as @ganesanmithun323 suggested).

I'm not sure what you mean by

I want the algorithm to find shortest paths in my graph induced by the "PERSON", "CITATION" and "BIOTERM" nodes, but computes the centrality measure only for the "PERSON" nodes.

Since closeness centrality uses shortest paths in the calculation, I suspect what you're really looking for is to calculate closeness centrality across all paths, but only return the results for person nodes? In which case, you can limit that in the post processing with your YIELD/RETURN.

My recommendation would be to start by defining what exactly you want to use to calculate your closeness centrality score, then modify your data model accordingly to represent that.

nmalsaud15 · April 26, 2020, 6:02am

CALL algo.closeness.stream("User", "FOLLOWS")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id, centrality
ORDER BY centrality DESC

There is no procedure with the name algo.closeness.stream registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

I used this code but I got this error and the code looks fine to me.
How I can solve this problems ?

alicia.frame · April 27, 2020, 1:01am

You need to install the graph algorithms library -- or it's successor, the graph data science library. If you've installed the GDS, the syntax is slightly different (CALL gds.alpha.closeness) -- see the docs here: Closeness Centrality - Neo4j Graph Data Science

nmalsaud15 · April 27, 2020, 2:49am

I did install the library. I am load the data from graph algorithms in ch5 but the error keeps showed even I used a previous version 3.5.17

Topic		Replies	Views
How to run closeness centrality for graphs with multiple node types? Graph Algorithms/Graph Data Science apoc , cypher	3	715	July 20, 2020
Running Centrality algorithm on the result of Cypher query Graph Algorithms/Graph Data Science	3	282	March 29, 2022
Difference between calling "algo.closeness.stream" and "algo.closeness" for large graphs Graph Algorithms/Graph Data Science	3	948	April 28, 2020
Optimizing a query with a subgraph/subquery, only look at specific nodes Neo4j Graph Platform performance , cypher	4	395	June 10, 2021
Optimizing a query issue Graph Algorithms/Graph Data Science	15	111	May 3, 2025

Get Certified in June!

Running closeness centrality for a subgraph instance and for a subset of nodes

Related topics