How to create a subgraph and run graph algorithms only on that?

Hi,
Using py2neo, I have a graph

mygraph = Graph("bolt://localhost:7687", auth=("neo4j", "***"))

I do all my queries in mygraph, also the graph algorithms

betweenness_query = """CALL algo.betweenness('', '',{
  concurrency: 8,
  direction: 'Both',
  writeProperty: 'betweenness'
})"""
mygraph.run(betweenness_query).data()
betweenness_table = """ MATCH (n) RETURN n as Nodes, id(n) as Node_id, n.betweenness as BetweenNess """
BetweenNess = mygraph.run(betweenness_table).to_data_frame()
BetweenNess

I looking to filter out a set of nodes

mygraph.run("MATCH (t:Trans {TrxID: 'T1'})-[*1..5]-(x) RETURN x").to_ndarray()

From the above I get a set of nodes, or any traversal of few nodes.

I would like to save that in a new variable for subgraph, then run the graph algorithm only on the set of nodes

is it possible?

Thanks in advance

You can store it as projected graph model (Graph Catalog - Neo4j Graph Data Science). If you are working with several node labels, look at "Loading multiple node properties" to see how you project multiple node properties.

Then you can use this projection in your algorithm by adding a graph:'nameForYourGraph' option.

3 Likes

Hi @Thomas_Silkjaer , Thanks for that,

I have this small group in a large set of nodes
image

Is it possible to create for traversal with specific values like below 2 instead of labels and relations

MATCH (t:PayTransactions {TrxID: 'T17'})-[*1..5]-(x) RETURN x

MATCH path = allShortestPaths((p:PayTransactions {TrxID:'T17'})--(pp:PayTransactions {TrxID:'T18'})) RETURN path

Thanks

Using cypher projection you need the first cypher query to output the id's of all nodes in the subgraph and the second to output the source and target id of all relations.

Hi @Thomas_Silkjaer

For Cypher Projection

CALL algo.betweenness.sampled.stream(
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[*0..2]-(m) WHERE n.CustomerNo = $cno RETURN id(n) as source, id(m) as target',
  {graph:'cypher', params: {cno : 'C13'} }
);

I got centralities as zeros

By running only

MATCH (n)-[*0..2]-(m) WHERE n.CustomerNo ='C13' RETURN id(n) as source, id(m) as target

I get the nodes as shown below , May I know the issue here please


and my final call is

CALL algo.betweenness.sampled.stream(
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[*0..2]-(m) WHERE n.CustomerNo = $cno RETURN id(n) as source, id(m) as target',
  {graph:'cypher', params: {cno : 'C13'} }
) YIELD nodeId, centrality
MATCH (customer:Customer) WHERE id(customer) = nodeId
RETURN customer.CustomerNo AS Customer, centrality AS BetweennessCentrality
ORDER BY centrality DESC limit 5;

which gives me
image
Thanks

Instead of above Cypher Projection then running

What if I run betweenness on whole graph and return the values

#Betweenness
CALL algo.betweenness.sampled('', '',{
  concurrency: 8,
  direction: 'Both',
  maxDepth: null,
  probability: null,
  strategy: 'random',
  writeProperty: 'approxBetweenness'
})

#Return Max Betweenness
MATCH (n)-[*0..2]-(m) WHERE n.CustomerNo = 'C13' RETURN id(m) AS ID, m.CustomerNo as Customer, m.betweenness as BetweennessCentrality
ORDER BY m.betweenness DESC
LIMIT 5

and I get

Can you please confirm whether both going to return me same betweenness

Thanks

Sorry, my knowledge on the algo's ends here. Hope someone else pitches in :slight_smile:

Thanks @Thomas_Silkjaer , no worries

The betweenness centrality of a node is calculated as the sum, for every pair of nodes, fraction of all pairs shortest paths in the graph that pass through that node divided by the total number of shortest paths between each pair of nodes:

If you compute betweenness centrality on a subgraph, you have a different set of nodes that you're calculating shortest paths between, so it's likely you'll get different metrics. My intuition is you'll probably get higher betweenness centrality metrics for the nodes in your subgraph vs when they're in the full graph (you're cutting out a whole lot of paths that don't go through them).

Does that make sense?