I need to go through a chain of nodes and sum up values from their relationships. Each relationship has two values I am interested in, walkTime and rideTime.. Each has a duration time value that I created with the relationship.
I need to traverse these nodes and sum up how much time the person was walking and how much they were riding during a trip or vacation, whatever.
How do I do this? I found this.. maybe there is a better (simpler) way?
CREATE (temp)
WITH temp
MATCH (tf:Transfer {id:'some_id'})
MATCH (tf)<-[in:IN_TO]-(in_account:Account)
MATCH (tf)-[out:OUT_TO]->(out_account:Account)
SET temp += PROPERTIES(tf)
WITH temp, SUM(in.value) AS sum_in, SUM(out.value) AS sum_out, COLLECT(in_account) AS in_accounts, COLLECT(out_account) AS out_accounts
SET temp.sum_in = sum_in
SET temp.sum_out = sum_out
WITH temp, PROPERTIES(temp) AS props, in_accounts, out_accounts
DELETE temp
RETURN props, in_accounts, out_accounts
You do not need the 'temp' node. You can do it without:
MATCH (tf:Transfer {id:100})
MATCH (tf)<-[in:IN_TO]-(in_account:Account)
MATCH (tf)-[out:OUT_TO]->(out_account:Account)
WITH PROPERTIES(tf) as props, SUM(in.value) AS sum_in, SUM(out.value) AS sum_out, COLLECT(in_account) AS in_accounts, COLLECT(out_account) AS out_accounts
RETURN props, in_accounts, out_accounts, sum_in, sum_out
Even so, this does not work because you are forming a cartesian product with the second and third matches. This causes your sums to sum duplicate values and your collections to also have duplicate values. Here is an example:
Modifying the return of the nodes to show the node ids, you can see the in_account and out_account nodes repeat three times. You can also see the sum is three times what it should be. This is all because the second match creates three rows and the third match creates three rows, resulting in nine total rows.
MATCH (tf:Transfer {id:100})
WITH
tf,
[(tf)<-[in:IN_TO]-(in_account:Account) | {rel: in, node: in_account}] as in_account_data,
[(tf)-[out:OUT_TO]->(out_account:Account) | {rel: out, node: out_account}] as out_account_data
RETURN
tf,
[i in in_account_data | i.node] as in_accounts,
[i in out_account_data | i.node] as out_accounts,
reduce(s = 0, i in in_account_data | s + i.rel.value) as in_account_sum,
reduce(s = 0, i in out_account_data | s + i.rel.value) as out_account_sum
Result with test data. Note the sums are correct and there are no duplication nodes.
LOL.. see.. this is why I ask you.. you just cannot trust what people put on the web.. always go to someone that knows what they are talking about.. LOL
And I .. know such a person..
I will try this out over the weekend, if not sooner.. thanks man!
Slight change in what I need sir.. if you dont mind.. the summing mechanism is in a thread of nodes connected together. Each node has a GPS point in it and the relationships between them have the two settings, one is the time that they walked between these two points and then other is the time that they drove between these two points.. so I need to Traverse (match then an unwind??) this thread of nodes and add up all the times the person walked (totalTimeWalked) and all the times when they drove (totalTimeDriven). then we just return these values into my java code and issue awards.. Make sense Captain?..
Yes sir. These Destination Nodes will be a chain of connected Destination Nodes.. one for each place that person goes to. So we need a mechanism that starts at the first node in the chain, which I can pass into this cypher chunk, and traverse the entire length of the chain of nodes. There will be one Traversed relationship between each of these nodes. That Traversed relationship will have three attributes TraversedType (which will hold Walked or Drove) and a walkedTime and a droveTime. The walkedTime and a droveTime will be durations. So for each relationship that connects these Destination Nodes together in the chain, sum up the DroveDurationTime and WalkedDurationTime for the entire chain. If TraversedType is Drove then totalDroveTime = totalDroveTime + duration in the current relationship. If TraversedType is Walked then totalWalkedTime = totalWalkedTime + duration in the current relationship. Duration attribute doesn’t care it is just the diff between the two timestamps. At the end of the chain processing, return both totalWalkedTime and totalDroveTime durations.. see?
Will there be multiple paths between these two nodes?
No sir.. Only one relationship/path. That relationship will hold attributes TraversedType (walked or drove), DroveDurationTime and WalkedDurationTime that we care about (there will be others, distance, etc. but, who cares).
Each Destination Node will hold a Neo4j timestamp of when they got to that destination, I use to calculate the Duration inside the relationship.. so we are trying to figure out in the chain of Destination Nodes, how much time did they spend walking or driving or… make sense Captain?
Do you have sample data?
Let me put that together for you now.. Thanks man.. I will send along the Cypher for this.. no problem.. I have to build that for us, and will, right after this meeting ends..
Do you have the initial part of the query that finds the path? I can add the calculations.
I think I can do that.. sure.. Not sure.. I have the starting node.. then we would use an unwind you think? I can try and use your thinking from my other question on removing relationship throughout a chain of nodes.
match(start:Destination{Name: 'D1'})
match path=(start)-[:traverse*]->(end:Destination)
where not exists ( (end)-[:traverse]->(:Destination) )
with
start,
[i in relationships(path) where i.TraversedType = 'Drove' | i.duration] as driveDurations,
[i in relationships(path) where i.TraversedType = 'Walked' | i.duration] as walkDurations
return
start,
reduce(s=duration({seconds:0}), i in driveDurations | s+i) as totalDriveTime,
reduce(s=duration({seconds:0}), i in walkDurations | s+i) as totalWalkTime
match(start:Destination{Name: 'D1'})
match path=(start)-[r:traverse]->*(end:Destination)
where not exists ( (end)-[:traverse]->(:Destination) )
with
start,
[i in r where i.TraversedType = 'Drove' | i.duration] as driveDurations,
[i in r where i.TraversedType = 'Walked' | i.duration] as walkDurations
return
start,
reduce(s=duration({seconds:0}), i in driveDurations | s+i) as totalDriveTime,
reduce(s=duration({seconds:0}), i in walkDurations | s+i) as totalWalkTime