Running closeness centrality for a subgraph instance and for a subset of nodes

Hello,

I want to subset my graph instance and get closeness centrality output for just one specific type of entity. Is there any way to optimize my query?

I am running the following cypher query:

CALL algo.closeness.stream('MATCH (p) WHERE ANY(lbl IN ["PERSON", "CITATION", "BIOTERM"] WHERE lbl IN LABELS(p)) RETURN id(p) as id',
'MATCH (p1)-[r]->(p2) RETURN id(p1) as source,id(p2) as target',
{graph:'cypher'})
YIELD nodeId, centrality

RETURN labels(algo.asNode(nodeId)) AS nodeType, nodeId AS nodeId, centrality LIMIT 100;

I think I am able to subset my graph using the cypher query in the first argument of the CALL algo.closeness.stream. I get the centrality scores of all the three type of nodes "PERSON", "CITATION" and "BIOTERM". I know that I can modify my RETURN statement to limit output only for my "PERSON". But I am wondering if there is a way to optimize the algorithm so it computes and fetches the output for "PERSON" nodes.

In other words, I want the algorithm to find shortest paths in my graph induced by the "PERSON", "CITATION" and "BIOTERM" nodes, but computes the centrality measure only for the "PERSON" nodes.

Thanks!

first, it will compute for all the nodes which are passed in cypher statement one. so , its not possible to have returned only person nodes .
second, in general , closeness centrality will be computed for similar/same node types .

But you can do one thing in this case , you can use the relations to the all nodes from person node and create a new relation between persons using jaccard similarity and you can store the similarity score in the relation itself ,

MATCH (person:PERSON)-[]->(node)
WHERE node:CITATION OR node:BIOTERM OR node:PERSON
WITH {item:id(person)  , categories:COLLECT(id(node))} as userData 
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {topK: 15, similarityCutoff: 0.1, write:true})

more about jaccard similarity : https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/jaccard/

and after that , you can run the closeness centrality on the newly formed relations

CALL algo.closeness.stream('PERSON', 'SIMILAR')
YIELD nodeId, centrality

RETURN algo.asNode(nodeId).name AS person, centrality
ORDER BY centrality DESC
LIMIT 20;

explanation :
Here i first connected the person nodes based on their similarity , then tried to compute the closeness centrality . If a person node is similar to many person nodes based on his relations , then he is likely to have high centrality measure .
Say, if a person is connected to too many CITATION and BIOTERM nodes, but no other person is connected to these CITATION or BIOTERM nodes, then this person will have less centrality score, since he wont have similarity to any other person node . Likely, if a person has less CITATION or BIOTERM, but many persons are connected to same CITATION or BIOTERM , this person will have high centrality score.

But in the approach you are following, the person who has too many CITATION , BIOTERM nodes can get high centrality score ..because the person is at a close distance to too many nodes (thing to note: person is connected closely to nodes of different type but not person node .. that is why its ideal to compute closeness centrality on graph with same type of nodes)

also , you can try out different similarity measure based on your use case .

1 Like

Cypher projections work by first identifying the set of nodes you want to use (first clause), and then defining the relationships you want to consider (second clause). So you can use the projection to subset or filter your input graph.

In order to just consider person nodes for your closeness centrality calculation, you'll want to create a monopartite person to person projection. For example, something like:

CALL algo.closeness.stream(
'MATCH (p:PERSON) RETURN id(p) as id',
'MATCH (p1:Person)-[r]-(p2:Person) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher'})
YIELD nodeId, centrality

(if you don't have one hop relationships between people, you may need something like MATCH (p1:Person)-->()<--(p2:Person) in your second query, or you could use similarity to create new person to person relationships as @ganesanmithun323 suggested).

I'm not sure what you mean by

I want the algorithm to find shortest paths in my graph induced by the "PERSON", "CITATION" and "BIOTERM" nodes, but computes the centrality measure only for the "PERSON" nodes.

Since closeness centrality uses shortest paths in the calculation, I suspect what you're really looking for is to calculate closeness centrality across all paths, but only return the results for person nodes? In which case, you can limit that in the post processing with your YIELD/RETURN.

My recommendation would be to start by defining what exactly you want to use to calculate your closeness centrality score, then modify your data model accordingly to represent that.

1 Like

CALL algo.closeness.stream("User", "FOLLOWS")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id, centrality
ORDER BY centrality DESC

There is no procedure with the name algo.closeness.stream registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

I used this code but I got this error and the code looks fine to me.
How I can solve this problems ?

You need to install the graph algorithms library -- or it's successor, the graph data science library. If you've installed the GDS, the syntax is slightly different (CALL gds.alpha.closeness) -- see the docs here: https://neo4j.com/docs/graph-data-science/current/algorithms/closeness-centrality/

I did install the library. I am load the data from graph algorithms in ch5 but the error keeps showed even I used a previous version 3.5.17