Algo.betweenness.sampled (RA-Brandes Algorithm): Out of memory error : help requested

ammubharatram · January 24, 2019, 2:44pm

Hi, following the example here in manual of algorithms of neo4j, I tried running the RA- Brandes algorithm for betweenness centrality for specific nodes , within 2 sets of relationships:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
WITH collect(orig) AS origs LIMIT 100
CALL algo.betweenness.sampled.stream('origs','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig:Originator) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

Before coming to the above query:

I tried running it without a limit in the number of nodes (originators)
with strategy of sampling as 'random'.

The query produces out of memory error:

Details:

I have 7091 "Customer" nodes , 452 "Originator" nodes and 1073 "Beneficiary" nodes. I am limiting the number of (Originator) nodes to consider to 100 (at least what I understand as per the query above) . I also ask for degree centrality of just 25 'Originator' nodes.

I already use max heap size of 12G and page cache size of 5G. I don't think my system supports any more (it has 16GB RAM). Any ideas on what am I missing? Is LIMIT 100 the way to limit the number of nodes explored? At least, getting the centrality for 100 nodes would be a good initial success.

Ideally, my goal is to get betweenness centrality for all "Originator" nodes looking at the network connected to 'Customer' nodes and 'Beneficiary' nodes using the 2 relationships asked to be matched.

Thanks in advance!

michael.hunger · January 26, 2019, 12:50pm

Where does that syntax come from?
Isn't the first parameter a label according to the docs:

So origs will be treated as a label which is not found and so it will use all nodes in the graph.

The strategies I see in the docs are random and degree.

ammubharatram · January 28, 2019, 2:23pm

Hi Michael, I'm sorry. That syntax comes from the old 'apoc.algo.betweenness' procedure which is now obsolete I guess.

So I updated the query to make the 1st parameter a label (Node label= Originator itself) . I also checked to make sure the query is in line with the example here with this query:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
CALL algo.betweenness.sampled.stream('Originator','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

But I still have my query running since over 2 hours. I also tried with only matching a few (25) relationships to run the betweenness centrality algorithm as follows:

MATCH  p=(cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WITH p limit 25
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
CALL algo.betweenness.sampled.stream('Originator','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
MATCH (orig) WHERE id(orig) = nodeId
//Measuring shortest path between customers connected to each originator
RETURN orig.id as originator,centrality order by centrality desc LIMIT 25

But I get output 'No changes, no records'. How can I get no changes and no records when I ask for return?

EDIT: I thought about it and after a few trials, I realized that it's because the nodes of label 'Originator' were not constrained uniquely by any nodeId but by a variable 'name'. Specifically, it's because of the "MATCH (orig) with id(nodes)".These nodes don't even have any 'nodeId' as property. I guess there arises another question : as the nodes of label 'Originator' were not constrained uniquely by nodeId but by a variable 'name', how can I 'Yield nodeName' ? The document doesn't address ( I tried 'nodes' instead of 'nodeId' as tried in examples for closeness centrality here, but it doesn't work. Please let me know how I can yield another property instead of 'nodeId' (which is null for 'Originator' nodes). Thanks again!

michael.hunger · January 29, 2019, 7:09am

Also you execute the algorithm for all rows that you generate here:

MATCH  (cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0

So if that returns 100 rows you execute the same algorithm call 100 times.

For executing betweenness centrality on a subset of nodes you need to use Cypher projections.
Please see the docs.

ammubharatram · January 31, 2019, 1:26pm

Hi Michael, I tried using maps like in the below and using it as a label for RA-Brandes algorithm, but seems map can't be inputed as label. I couldn't find a use case for graph algorithms for maps. Is there one in the docs? Thanks in advance :)

//Betweeenness Centrality- Originator
MATCH  p=(cust:Customer)-[:LINKED_TO_BENF]->(benf:Beneficiary), (cust:Customer)-[:LINKED_TO_ORIGINATOR]-> (orig:Originator)
WHERE toFloat(cust.CUSTOMER_RISK_SCORE_SCALED) >0
WITH p , origs { .name, .ctry_code,.offshore_flag,customers: collect(cust { .id, .CUSTOMER_RISK_SCORE_SCALED,.panama_flag })}  limit 25 
//origs as map
CALL algo.betweenness.sampled.stream('origs','LINKED_TO_ORIGINATOR',  {strategy:'degree',direction:'in'}) YIELD nodeId,centrality
//Measuring shortest path between customers connected to each originator
RETURN origs.id as originator,centrality order by centrality desc LIMIT 25

ammubharatram · March 11, 2019, 10:20am

Hi Michael,

Thank you for your support. I have some constructive criticism for SEO with neo4j. I searched in google for 'cypher projection' but I got map projections:

I realized a long way after going through the following process:

Visited this stack overflow answer:

image1702×1016 138 KB
Lucky enough to click the link cypher projection which lead me here to generic neo4j graph algorithms:
The Neo4j Graph Data Science Library Manual v2.2 - Neo4j Graph Data Science
I was again lucky to have gone to the betweenness algorithm and searched for 'projection'.
Then this came up, which I think should have come up when I googled 'cypher projections'.

image1808×1006 167 KB

In this regard, my sincere request and feedback to please improve documentation to be more aligned with SEO. It took me almost a month to figure that I was using map projection instead of cypher projection! Thanks for your understanding Michael.
Regards,
Bharat

Topic		Replies	Views
Running betweenness centrality algorithm Graph Data Science / Graph Analytics	2	833	February 10, 2020
Why the Betweenness Centrality algorithm can't YIELD "score"? Graph Data Science / Graph Analytics	4	437	September 16, 2020
Neo4j loses connections when performing large-scale data graph depth/breadth searches Neo4j Graph Platform migrated	7	543	November 28, 2022
Apoc.algo.betweenness procedure not found: call apoc.help('algo') running Procedures & APOC apoc	1	1302	January 21, 2019
Running graph algorithms with APOC periodic.iterate Graph Data Science / Graph Analytics	8	1030	February 25, 2020

Take the Course Then Join The Aura Agent Hackathon

Algo.betweenness.sampled (RA-Brandes Algorithm): Out of memory error : help requested

Related topics

Take the Course Then Join
The Aura Agent Hackathon