Hello, I am trying to solve the following problem:
I have transaction data between Accounts (nodes) with each :SENT transaction as a directional relationship between nodes. I would like to compute the balance of each Account by taking the difference of the sums of all incoming and outgoing transactions, but I am running into issues with how long it is taking to run my query. Please bear in mind that I am a Cypher newbie so don't really know what I'm doing
The Account node is the only type of node in the database and the SENT relationship is the only type of relationship. The database is not very large, about 50,000 nodes and 70,000 relationships.
My current query is as follows:
MATCH (n)
WITH n
MATCH ()-[incoming]->(l) WHERE l.id = n.id
WITH n, incoming
WITH n, sum(incoming.amount) as sumIn
MATCH (m) -[outgoing]-> () WHERE m.id = n.id
WITH n, outgoing, sumIn
WITH n, sumIn, sum(outgoing.amount) as sumOut
SET n.balance = sumIn - sumOut
But the runtime of this query was more than 80 minutes, which seems quite strange for a relatively small dataset.
First I tried a query like this:
MATCH () -[incoming]-> (n) -[outgoing]-> ()
WITH n, incoming, outgoing
return n.id, count(distinct incoming), count(distinct outgoing), sum(distinct incoming.amount) - sum(distinct outgoing.amount) as balance
But while count()
in this case returns the correct number of incoming and outgoing transactions, I have not been able to sum the amounts.
When I run the MATCH query separately as
MATCH () -[incoming]-> (n)
and
MATCH (n) -[outgoing-> ()
I get the correct counts and sums, but as soon as I try to run it together like
MATCH () -[incoming]-> (n) -[outgoing]-> ()
it returns a much larger number for both incoming
and outgoing
and it's the same for both.
That prompted me to try the following approach:
MATCH (n)
WITH n, (n)-[:SENT]->() as outgoing, (n)<-[:SENT]-() as incoming
return n.id,count(s),count(r)
Which gets the correct counts, but since outgoing
and incoming
here are paths, not relationships, I don't know how to access the amount
property in this case to sum and calculate the balance.
This led me to the current approach which I listed first, to match all nodes, then match and sum all incoming edges for each node, then match and sum all outgoing edges for each node and set the balance. But the runtime of this is incredibly long.
What am I missing or overthinking? I feel like there must be a better way to accomplish what seems like a relatively common and simple task. Thanks for any input!