Fast Aggregation operation

sagar.chaudhari · October 23, 2023, 9:09am

Hi
I have 500 million Transaction nodes and I am performing aggregation operation on these nodes but it takes lot of time to return a result.
Is there any way to make this operation fast by using custom procedure in java or CQL ?

glilienfield · October 23, 2023, 9:35am

Can you share the code?

sagar.chaudhari · October 23, 2023, 12:25pm

I am using below query, to get result
MATCH(n:Trx ) return sum(n.amount) as totalAmount;
OR
MATCH(n:Trx {date: 2023-03-01}) return sum(n.amount) as totalAmount;

simple node i am using

glilienfield · October 23, 2023, 12:48pm

I don’t think a custom procedure would perform better. They are generally used to do something cypher can’t.

Your use case doesn’t utilize the graph capabilities of neo4j. This could be done in a relational database. I assume you have graph-like domain model that justifies using Neo4j over a traditional database.

Basically you are finding all nodes by a label or all nodes by a label and property and then summing all their amount properties. In the first case the collections of nodes should be found easily with a NodeByLabelScan. For the second case, you should have an index on the date property for the Trx label to speed retrieval. How many nodes are you summing? That would also impact performance. If you have a large number of Trx nodes then the first query may be expensive. Do you calculate this often? In a data analytics solution, these would be calculated over set periods (days, weeks, months, quarters, etc) making reporting quicker.

david.fauth · October 23, 2023, 6:40pm

At the Nodes 2023 conference (Thursday, October 26) there will be a discussion on the new parallel runtime announced in Neo4j 5.13.0

This would run that analytical query across multiple cpus and should return the answer back in a much faster time.

sagar.chaudhari · October 25, 2023, 2:22am

Thank you for providing guidance

sagar.chaudhari · October 25, 2023, 2:36am

Parallel runtime it means Parallel Cypher Execution

@david.fauth correct me if i wrong

actually currently we are having approx 9 million transaction in every month, so for 6 months it is approx 500 million.
It means we are performing aggregation operation SUM(amount), MIN(amount), MAX(amount) on 500 million nodes and because of this queries performance down.
Actually some times it takes around 45 minutes

david.fauth · October 25, 2023, 3:28pm

Sagar,

Neo4j just announced our new parallel runtime available in 5.13 for these types of queries.

Please read the docs here:

and

If you have questions, please let me know.

Topic		Replies	Views
Slow Aggregation when dealing with 1M+ node Cypher	2	253	May 31, 2023
How to Aggregate calculation of data faster? Cypher cypher	3	1116	March 3, 2019
Very slow performance when aggregate on node property Neo4j Graph Platform performance , migrated , cypher-tagged	2	319	March 5, 2023
Very slow performance when aggregate on node property Cypher performance	21	4737	December 21, 2020
How to fetch millions of data faster? Cypher	30	8498	October 18, 2019

July Summer Fun!

Fast Aggregation operation

Related topics