Fast Aggregation operation

Hi
I have 500 million Transaction nodes and I am performing aggregation operation on these nodes but it takes lot of time to return a result.
Is there any way to make this operation fast by using custom procedure in java or CQL ?

Can you share the code?

I am using below query, to get result
MATCH(n:Trx ) return sum(n.amount) as totalAmount;
OR
MATCH(n:Trx {date: 2023-03-01}) return sum(n.amount) as totalAmount;

simple node i am using

I don’t think a custom procedure would perform better. They are generally used to do something cypher can’t.

Your use case doesn’t utilize the graph capabilities of neo4j. This could be done in a relational database. I assume you have graph-like domain model that justifies using Neo4j over a traditional database.

Basically you are finding all nodes by a label or all nodes by a label and property and then summing all their amount properties. In the first case the collections of nodes should be found easily with a NodeByLabelScan. For the second case, you should have an index on the date property for the Trx label to speed retrieval. How many nodes are you summing? That would also impact performance. If you have a large number of Trx nodes then the first query may be expensive. Do you calculate this often? In a data analytics solution, these would be calculated over set periods (days, weeks, months, quarters, etc) making reporting quicker.

1 Like

At the Nodes 2023 conference (Thursday, October 26) there will be a discussion on the new parallel runtime announced in Neo4j 5.13.0

This would run that analytical query across multiple cpus and should return the answer back in a much faster time.

1 Like

Thank you for providing guidance

Parallel runtime it means Parallel Cypher Execution

@david.fauth correct me if i wrong

actually currently we are having approx 9 million transaction in every month, so for 6 months it is approx 500 million.
It means we are performing aggregation operation SUM(amount), MIN(amount), MAX(amount) on 500 million nodes and because of this queries performance down.
Actually some times it takes around 45 minutes

Sagar,

Neo4j just announced our new parallel runtime available in 5.13 for these types of queries.

Please read the docs here:

and

If you have questions, please let me know.