Algo.betweenness.sampled (RA-Brandes Algorithm): Out of memory error : help requested

cypher
(Bharat Ram Ammu) #1

Hi, following the example here in manual of algorithms of neo4j, I tried running the RA- Brandes algorithm for betweenness centrality for specific nodes , within 2 sets of relationships:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
WITH collect(orig) AS origs LIMIT 100
CALL algo.betweenness.sampled.stream('origs','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig:Originator) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

Before coming to the above query:

  1. I tried running it without a limit in the number of nodes (originators)
  2. with strategy of sampling as 'random'.

The query produces out of memory error:

Details:

I have 7091 "Customer" nodes , 452 "Originator" nodes and 1073 "Beneficiary" nodes. I am limiting the number of (Originator) nodes to consider to 100 (at least what I understand as per the query above) . I also ask for degree centrality of just 25 'Originator' nodes.

I already use max heap size of 12G and page cache size of 5G. I don't think my system supports any more (it has 16GB RAM). Any ideas on what am I missing? Is LIMIT 100 the way to limit the number of nodes explored? At least, getting the centrality for 100 nodes would be a good initial success.

Ideally, my goal is to get betweenness centrality for all "Originator" nodes looking at the network connected to 'Customer' nodes and 'Beneficiary' nodes using the 2 relationships asked to be matched.

Thanks in advance!

0 Likes

(Michael Hunger) #2

Where does that syntax come from?
Isn't the first parameter a label according to the docs:

https://neo4j.com/docs/graph-algorithms/current/algorithms/betweenness-centrality/#_ra_brandes_algorithm

So origs will be treated as a label which is not found and so it will use all nodes in the graph.

The strategies I see in the docs are random and degree.

0 Likes

(Bharat Ram Ammu) #3

Hi Michael, I'm sorry. That syntax comes from the old 'apoc.algo.betweenness' procedure which is now obsolete I guess.

So I updated the query to make the 1st parameter a label (Node label= Originator itself) . I also checked to make sure the query is in line with the example here with this query:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
CALL algo.betweenness.sampled.stream('Originator','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

But I still have my query running since over 2 hours. I also tried with only matching a few (25) relationships to run the betweenness centrality algorithm as follows:

MATCH  p=(cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WITH p limit 25
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
CALL algo.betweenness.sampled.stream('Originator','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

But I get output 'No changes, no records'. How can I get no changes and no records when I ask for return?

EDIT: I thought about it and after a few trials, I realized that it's because the nodes of label 'Originator' were not constrained uniquely by any nodeId but by a variable 'name'. Specifically, it's because of the "MATCH (orig) with id(nodes)".These nodes don't even have any 'nodeId' as property. I guess there arises another question : as the nodes of label 'Originator' were not constrained uniquely by nodeId but by a variable 'name', how can I 'Yield nodeName' ? The document doesn't address ( I tried 'nodes' instead of 'nodeId' as tried in examples for closeness centrality here, but it doesn't work. Please let me know how I can yield another property instead of 'nodeId' (which is null for 'Originator' nodes). Thanks again!

0 Likes

(Michael Hunger) #4

Also you execute the algorithm for all rows that you generate here:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0

So if that returns 100 rows you execute the same algorithm call 100 times.

For executing betweenness centrality on a subset of nodes you need to use Cypher projections.
Please see the docs.

1 Like

(Bharat Ram Ammu) #6

Hi Michael, I tried using maps like in the below and using it as a label for RA-Brandes algorithm, but seems map can't be inputed as label. I couldn't find a use case for graph algorithms for maps. Is there one in the docs? Thanks in advance :)

//Betweeenness Centrality- Originator
MATCH  p=(cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
WITH p , origs { .name, .ctry_code,.offshore_flag,customers: collect(cust { .id, .CUSTOMER_RISK_SCORE_SCALED,.panama_flag })}  limit 25 
//origs as map
CALL algo.betweenness.sampled.stream('origs','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
//Measuring shortest path between customers connected to each originator
RETURN origs.id as originator,centrality order by centrality desc LIMIT 25
0 Likes

(Bharat Ram Ammu) #7

Hi Michael,

Thank you for your support. I have some constructive criticism for SEO with neo4j. I searched in google for 'cypher projection' but I got map projections:

I realized a long way after going through the following process:

  1. Visited this stack overflow answer:
  2. Lucky enough to click the link cypher projection which lead me here to generic neo4j graph algorithms:
    https://neo4j.com/docs/graph-algorithms/current/
  3. I was again lucky to have gone to the betweenness algorithm and searched for 'projection'.
  4. Then this came up, which I think should have come up when I googled 'cypher projections'.

In this regard, my sincere request and feedback to please improve documentation to be more aligned with SEO. It took me almost a month to figure that I was using map projection instead of cypher projection! Thanks for your understanding Michael.
Regards,
Bharat

0 Likes