Categorizing Nodes based on their weight compared to peers

rfredesveas · January 22, 2021, 8:59pm

Hi: I do have the following scenario:

The challenge is to add a property to each Fruit with one of this options ('TOP','MIDDLE','BOTTOM'). So it would be 'TOP' if they are in the top 30% with the highest weights, ´MIDDLE' in the top 70% highest weights, and 'BOTTOM' for the rest.
So I generate my sample grapgh with this code:
// clear data
MATCH (n)
DETACH DELETE n;

//Create Sample Basket
CREATE (:Fruit {Name: "Pear"})<-[:INCLUDES {Weight: 0.05}]-(b:Basket)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Apple"}),(:Fruit {Name: "Ananas"})<-[:INCLUDES {Weight: 0.15}]-(b)-[:INCLUDES {Weight: 0.30}]->(:Fruit {Name: "Peach"}),(b)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Banana"})

And then I Bring the fruits in the basket ordered by they weight percentage over the basket total.
MATCH (b:Basket)-[i:INCLUDES]-(f:Fruit)
Return f.Name as FruitName, i.Weight as FruitPercent ORDER BY FruitPercent DESC
and I get:

FruitName	FruitPercent
"Peach"	0.3
"Apple"	0.25
"Banana"	0.25
"Ananas"	0.15
"Pear"	0.05

What I do need to produce is this instead:

FruitName	FruitPercent	FruitPercentAccumulated	Category
"Peach"	0.3	0.3	'TOP'
"Apple"	0.25	0.55	'MIDDLE'
"Banana"	0.25	0.7	´MIDDLE'
"Ananas"	0.15	0.85	'BOTTOM'
"Pear"	0.05	1.00	'BOTTOM'

I have tried many alternatives to generate the accumulated percentage but did not found any that worked, since it requieres to use a calculated variable over the first list. At the end I also want to insert back the atribute Category in every Fruit node with the result value.

The problem I am exposing is a very simplificated example that illustrates what I am really trying to solve which is the ABCFMR Analisys in Logistics.

.

Thanks in advance for any help!!

tard_gabriel · January 28, 2021, 1:28am

Nice challenge I like it, however I'm not sure if it's optimised for Neo4j.

I didn't figure out the whole thing yet but this part is a good start:

WITH [0.3,0.25,0.25,0.15,0.05] AS FruitPercents
RETURN [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS FruitPercentAccumulated

The returns values are not precise and even wrong sometimes, I don't know what is going on here, maybe a Neo4j staff member can help us.

rfredesveas · January 28, 2021, 2:27pm

Thanks Gabriel!!! a good tip, I am able now to get the accumulated percentage. The rounding errors while surprising are not that relevant at this point.
Now the challenge is to maintain the percentages linked to the correspondent relationships and being able to build this output:

FruitName	FruitPercent	FruitPercentAccumulated	Category
"Peach"	0.3	0.3	'TOP'
"Apple"	0.25	0.55	'MIDDLE'
"Banana"	0.25	0.7	´MIDDLE'
"Ananas"	0.15	0.85	'BOTTOM'
"Pear"	0.05	1.00	'BOTTOM'

It is being really hard to code something that would be so easy in other languages with loop, local and global variable, etc.

tard_gabriel · January 28, 2021, 6:28pm

Hi again @rfredesveas

Can I have the Cypher query you used for this result, I might be able to write back these results in the corresponding relationships.

But I don't understand what are you actually trying to do here?
The accumulated result doesn't make sens on the relationship as it's not a coherent part of the graph.

rfredesveas · January 28, 2021, 7:28pm

Thanks you so much Gabriel for your help. I do not have that Cypher, that is the result I want to produce.
So these are the steps I tried to implement.

Generate a list of all the relationships from basket to fruits with their relative weights in the basket.
Sort that list from the highest to the smallest weight.
Generate a new property with the accumulated weight.
The top 30% most relevant fruits in the baskets get a 'TOP' category, the following 40% get a 'MIDDLE' and the bottom 30 get a 'BOTTOM'
Add a property to each fruit with the category they obtained.

My code so far for this example is the following:
// clear data

MATCH (n)

DETACH DELETE n;

//Create Sample Basket

CREATE (:Fruit {Name: "Pear"})<-[:INCLUDES {Weight: 0.05}]-(b:Basket)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Apple"}),

(:Fruit {Name: "Ananas"})<-[:INCLUDES {Weight: 0.15}]-(b)-[:INCLUDES {Weight: 0.30}]->(:Fruit {Name: "Peach"}),

(b)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Banana"})

//Bring the fruits in the basket ordered by they weith percentege over the total

    MATCH path= (b:Basket)-[incl:INCLUDES]-(f:Fruit) 

    UNWIND relationships(path) as inc

    with collect(toFloat(inc.Weight)) as FruitPercents

    RETURN [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS FruitPercentAccumulated

I hope this clarifies better the challenge.
Saludos y muchas gracias!!
Rodrigo

tard_gabriel · February 5, 2021, 3:31am

Yep, after consideration, I still think that your request is something like a post Neo4j request result's treatment. That's why it's so not natural to do it with Neo4j.

But your problem makes me realise that the UNWIND clause sucks a bit since it's not possible to unwind more than one list as columns of the same row. Maybe it's a part of the principles of the data transfer between procedures behind the Cypher query language but it's definitely a limitation and I didn't see any work around it for now.

So I would call for help here by a Neo4j staff, god or a lucky community member with the good idea at the right moment lol

rfredesveas · February 7, 2021, 12:33pm

Well. Thanks you Gabriel for the efforts!!!
Do you have any idea on how I can get access to neo4j experts?

ameyasoft · February 8, 2021, 5:21am

Try this:

match (b:Basket)-[r]-(a)
with collect(distinct id(b)) as b1, collect(distinct id(a)) as a1, collect(r.Weight) as wt
with apoc.coll.sort(wt) as wt1, b1, a1
//with toFloat("0.3") as r1, wt1

//using the percentile calculation like in graduate school......
//https://www.dummies.com/education/math/statistics/how-to-calculate-percentiles-in-statistics/

with round(toFloat("0.3") * size(wt1)) as t1, round(toFloat("0.7") * size(wt1)) as t2, wt1, b1, a1
with b1, a1, wt1, toInteger(t1) as t11, toInteger(t2) as t22

MATCH (b2)-[r1]-(a2)
where id(b2) in b1 and id(a2) in a1
with a2.Name as name, r1.Weight as weight,
toFloat(wt1[t11-1]) as q1, toFloat(wt1[t22-1]) as q2

with name, weight, q1, q2, 
CASE
   WHEN weight <= q1 THEN 'BOTTOM'
   WHEN q1 < weight <= q2 THEN 'MEDIUM'
   WHEN weight > q2 THEN 'TOP'
   ELSE weight END as level
   
   return name, weight, level order by weight  desc
 If you are using Neo4j version > 4.1 this query can be simplified. Am expecting this query to work in all versions.

Result:

tard_gabriel · February 8, 2021, 7:44am

They identify themself with the Neo4j Staff subtitle by their name in the community forum.
They are the most qualify to know the limits and possibilities of Neo4j.

You can click on their profile picture and message them personally about your topic.
Usually you can get some's attention with the @ operator but it works only with the active participant of a topic.

I figure out most of it but there is always this underlying loop behavior between each Neo4j clauses who fucked the end result. But I'm close.

tard_gabriel · February 8, 2021, 8:31am

@ameyasoft gave me an idee with the CASE clause.

I think I know exactly how to finish this _______ problem :)
With a short and sweet solution

rfredesveas · February 8, 2021, 11:56pm

SO paciently waiting for it

tard_gabriel · February 12, 2021, 8:19am

Finally did it

// Basket Beta
MATCH (:Basket)-[i:INCLUDES]->(f:Fruit)
WITH f.name AS FruitName, i.weight AS FruitPercent ORDER BY FruitPercent
WITH collect(FruitName) AS FruitNames, collect(FruitPercent) AS FruitPercents
WITH FruitNames, FruitPercents, [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS Accumulation
UNWIND [i IN range(0,size(Accumulation) - 1) | [FruitNames[i], FruitPercents[i], Accumulation[i]]] AS row
RETURN row[0] AS FruitName, row[1] AS FruitPercent, row[2] AS FruitPercentAccumulated

Even if the multiple WITH clauses seems to make the query longer, it's actually just aggregation processing who are not doing db hits. Meaning the query performance is good or really good depending on how indexes are created and only if the data model stay small.

There is one column left to do, I don't have so much time tonight but I think it's now possible to do it with the CASE approach above. If you have anymore question about my solution let me know.

rfredesveas · February 12, 2021, 1:58pm

Thanks you so much Gabriel!! I was raching kind of the same result with the reduce sentence and then the Unwind.
But this helps a lot! very clear code. I'll be now integrating the whole and applying it to the real problem I am trying to solve. I'll let you know how it ends!

Muchas Gracias!

tard_gabriel · February 12, 2021, 8:00pm

Your welcome

The main problem here was to be able to unwind not 1 but 3 columns at the same time.
The idea behind the solution was to encode each element of the unwinded list as a row with columns.

I could have encoded my columns set as a map instead and having the same result.
The difference would be the way I decode my columns in the return clause.

Your problem was a pain in the ass to resolve for me but at the end it wasn't so hard to resolve. That's why I'm thinking about writing an article about Cypher.

If you don't mind, I would like to use your example in it. It's a really tricky one who force yourself to understand how Cypher works and using it beyond what you know. But it's a cute and easy to understand model.

rfredesveas · February 12, 2021, 9:11pm

I would even help you if u wish.

I am using this as part of a bigger problem and am still far to be done. But I would be glad to help.

bests

tard_gabriel · February 13, 2021, 12:32am

I would like to know how it ends at a larger scale.

Topic		Replies	Views
Compare two graphs Neo4j Graph Platform cypher	20	1835	May 16, 2020
Return top percentage of results Cypher	5	515	September 7, 2021
Calculating from relationships in a multi-node path Cypher cypher	1	69	July 30, 2025
Return Start and End Nodes in a path while using reduce function Neo4j Graph Platform migrated	1	166	June 16, 2022
How to use Jaccard similarity algorithm in neo4j to find the similar nodes Procedures & APOC cypher	17	4531	January 17, 2019

Categorizing Nodes based on their weight compared to peers

Related topics