SOLVED: Iterating over a query based on multiple values of property X and summing the yielded result


(Dave) #1

I know that there is no way of using MATCH, UNWIND or FILTER within a FOREACH loop, but do not what could solve the problem. I have the following query to start,

MATCH (u:User)-[s:SIMILAR]-(:User)-[t:TRIED]-(p:Product)

The TRIED has 10 properties with values 0-3 but the SIMILAR relationship has a property of s.strength which can be any integer value greater than 5

What I am trying to create is an average of averages using the following

sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... )

example:
strength = 7
t1 = 1, 1, 1, 0
result = 5.75

the following grid is the desired average results from the first part of the equation

       s.strength         | strSum
	  |  7    |  8  |  9  |
t.t1  | 5.25  |  0  |  0  | 5.25
t.t2  | 19.25 |  16 |  27 | 62.25 
t.t3  |  21   |  24 |  27 | 72

I then want to take the strSum value and divide it

sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... ) / sum(distinct s.strength)

to give me an average of averages weighted by s.strength.

Does anyone have a way to iterate through all the values of s.strength and return these weighted averages or is there an APOC proceedure that would help with this. I have been looking at the periodic.rock_n_roll_while bit am not quite sure if this is right.

Below is some test data to create and setup the graph that would give the above grid results

CREATE (u:User {ID: 'A'}), (p:Product {ID: 'A'}) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'B'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'C'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'D'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'E'})-[t:TASTED {t1: 0, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'F'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'G'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 9}]-(:User {ID: 'H'})-[t:TASTED {t1: 0, t2: 3, t3: 3}]-(p)

EDIT: fix incorrect values in t1


(Andrew Bowman) #2

I'll see what I can do. It's important to note that there's a minor flaw in your create query, you need WITH u, p on the first line, to reuse the p node, otherwise a new blank node is created and used for p for the rest of the query.


(Andrew Bowman) #3

I think I'm missing something about your calculations. Here's your example:

example:
strength = 7
t1 = 0, 0, 0, 1
result = 5.75

With a strength of 7 and the given t1 values, the formula doesn't seem to work out to your expected result:

(strength * (average of t1 at strength = 7))
= (7 * 0.25)
= 1.75

Can you clarify what calculation(s) you're using to produce the result in your example?


(Dave) #4

Yes you are right, it was late at night. My spread sheet was setup with the

t1 as 1, 1, 1, 0

So yes in the example I gave, it is as you say 1.75

Sorry for the confusion


(Michael Hunger) #5

Please edit the original post to fix it.


(Dave) #6

Edited.

I had tried last night to create a new (:Average) node inside of a FOREACH loop, and then I could just sum all the t1 values from the :Average nodes, in the following query, but cannot MATCH.

MATCH (u:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u, COLLECT(t { .*, strength: s.strength}) AS tastings, COLLECT(distinct s.strengths) AS strengths, SUM(distinct s.strength) AS strengthSum
UNWIND tastings AS tasting
FOREACH (strength IN strengths |
    MATCH (tasting) WHERE tasting.strength = strength
	CREATE (n:Average {user: u._id, product: p._id}) SET 
    n.taste1 = apoc.coll.avg(tasting.t1),
    n.taste2 = apoc.coll.avg(tasting.t2),
    n.taste3 = apoc.coll.avg(tasting.t3))
WITH p, u
MATCH (a:Averge {user: u._id, product: p_id})
WITH sum(a.t1)/strengths AS avg1, sum(a.t2)/strengths AS avg2, sum(a.t3)/strengths AS avg3
MERGE (u)-[r:AVG_TASTE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3

So I tried the following

MATCH (u1:User {ID: 'A'})-[s:SIMILAR]-(u2:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u1, u2, COLLECT(t { .*, strength: s.strength}) AS tasting, COLLECT(distinct s.strength) AS strengths, SUM(distinct s.strength) AS strengthSum
FOREACH (strength IN strengths |
	CREATE (n:Average {user: u1._id, product: p._id, strength: strength}) 
    )
WITH p, u1, u2, tasting, strengthSum
MATCH (a:Averge {user: u1._id, product: p._id})
MATCH (u2)-[t2:TASTED]-(p) WHERE t2.strength = a.strength
SET 
    a.taste1 = apoc.coll.avg(t2.t1),
    a.taste2 = apoc.coll.avg(t2.t2),
    a.taste3 = apoc.coll.avg(t2.t3)
WITH strengthSum, u1, p
MATCH (a2:Averge {user: u1._id, product: p._id})
WITH sum(a2.t1)/strengthSum AS avg1, sum(a2.t2)/strengthSum AS avg2, sum(a2.t3)/strengthSum AS avg3
MERGE (u1)-[r:AVG_SCORE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3
RETURN r

But the distinct s.strength does not gather distinct strength values and the query ends after the FOREACH loop.


(Dave) #7

Found something by accident in a question on StackOverflow which solved this using

UNWIND RANGE(min,max)

And creating nodes for each strength with the averages.

For anyone reading this and looking for a solution, there is this.

MATCH 
    (u1:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Wine {ID: 'A'})
WITH u1, p,
    min(s.strength) as minS,
    max(s.strength) as maxS
WITH 
    u1, p, range(minS, maxS) AS range
UNWIND 
    range AS range2
WITH 
     distinct toInteger(range2) as i, u1, p
MATCH 
    (u1)-[s2:SIMILAR]-(:User)-[t2:TASTED]-(p) WHERE s2.strength = i
WITH 
    u1, p, i,
    COLLECT(toInteger(t2.taste1)) AS taste1,
    COLLECT(toInteger(t2.taste2)) AS taste2,
    COLLECT(toInteger(t2.taste3)) AS taste3
MERGE 
    (a:Average {user: u1._id, product: p._id, strength: i})
ON MATCH SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
ON CREATE SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
WITH u1, p
MATCH (a2:Average {user: u1._id, product: p._id})
WITH 
    u1, p,
    sum(a2.taste1)/sum(a2.strength) AS avg1, 
    sum(a2.taste2)/sum(a2.strength) AS avg2, 
    sum(a2.taste3)/sum(a2.strength) AS avg3
MERGE (u1)-[r:AVG_SCORE]->(p) 
    ON CREATE SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
    ON MATCH SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
RETURN r

If there is a more efficient method then great but this is a working solution for now.


(Michael Hunger) #8

Hey Dave, this should be easier:

MATCH (u:User)-[s:SIMILAR]->(:User)-[t:TASTED]->(p:Product) 
// turn key into a column
UNWIND keys(t) as k 
// group by strength and key
WITH u,p, s.strength AS str, k, avg(t[k]) AS tx 
RETURN u,p,sum(str*tx) AS weight

+--------------------------------------------+
| u               | p               | weight |
+--------------------------------------------+
| Node[0]{ID:"A"} | Node[1]{ID:"A"} | 147.5  |
+--------------------------------------------+

See also: http://console.neo4j.org/r/eb4xl2


(Dave) #9

OH!!! Thats rather elegant and has given me another idea for use case. Thank you @michael.hunger


(Dave) #10

I have just ran into a slight issue with that approach. I'm using the uuid module from Graphaware which throws in an error of avg() can only handle numerical values or null, is there a way to filter the keys that are only integer?

Have tried the below, but still get the numerical error

CASE t[k] WHEN toInteger(t[k]) THEN avg(t[k]) ELSE null END AS tx


(Michael Hunger) #11

this should work:

WHERE apoc.meta.isType(tx,'INTEGER')