Optimize getting all incoming and outgoing edges for each node and computing the difference in amounts

Hello, I am trying to solve the following problem:

I have transaction data between Accounts (nodes) with each :SENT transaction as a directional relationship between nodes. I would like to compute the balance of each Account by taking the difference of the sums of all incoming and outgoing transactions, but I am running into issues with how long it is taking to run my query. Please bear in mind that I am a Cypher newbie so don't really know what I'm doing :slightly_smiling_face:

The Account node is the only type of node in the database and the SENT relationship is the only type of relationship. The database is not very large, about 50,000 nodes and 70,000 relationships.

My current query is as follows:

MATCH (n)
WITH n
MATCH ()-[incoming]->(l) WHERE l.id = n.id
WITH n, incoming
WITH n, sum(incoming.amount) as sumIn
MATCH (m) -[outgoing]-> () WHERE m.id = n.id
WITH n, outgoing, sumIn
WITH n, sumIn, sum(outgoing.amount) as sumOut
SET n.balance = sumIn - sumOut

But the runtime of this query was more than 80 minutes, which seems quite strange for a relatively small dataset.

First I tried a query like this:

MATCH () -[incoming]-> (n) -[outgoing]-> () 
WITH n, incoming, outgoing
return n.id, count(distinct incoming), count(distinct outgoing), sum(distinct incoming.amount) - sum(distinct outgoing.amount) as balance

But while count() in this case returns the correct number of incoming and outgoing transactions, I have not been able to sum the amounts.
When I run the MATCH query separately as
MATCH () -[incoming]-> (n) and
MATCH (n) -[outgoing-> () I get the correct counts and sums, but as soon as I try to run it together like
MATCH () -[incoming]-> (n) -[outgoing]-> () it returns a much larger number for both incoming and outgoing and it's the same for both.

That prompted me to try the following approach:

MATCH (n)
WITH n, (n)-[:SENT]->() as outgoing, (n)<-[:SENT]-() as incoming
return n.id,count(s),count(r)

Which gets the correct counts, but since outgoing and incoming here are paths, not relationships, I don't know how to access the amount property in this case to sum and calculate the balance.

This led me to the current approach which I listed first, to match all nodes, then match and sum all incoming edges for each node, then match and sum all outgoing edges for each node and set the balance. But the runtime of this is incredibly long.

What am I missing or overthinking? I feel like there must be a better way to accomplish what seems like a relatively common and simple task. Thanks for any input!

Hi all,

I have a similar problem. I have a dataset with customers and edges to orders where I try to count all edges from customers to orders.

MATCH (c:Customer)
SIZE([(c)--(o:Order) | o]) AS amount
RETURN min(amount), max(amount), avg(amount)

This query works fine, but I think I could speed it up. The property 'customerID' on vertices labelled Customer is a key and is indexed, yet I cannot get my head around how to rephrase the query to make use of this index.

Thanks for any help, much appreciated.

Best,

Philipp

Hi,

Apologies, I forgot the WITH after MATCH clause.

Still trying to get my head around how to make use of the index for this query.

Thanks for any help with it.

Philipp