RE: How to Aggregate calculation of data faster?

Hi Michael,

I am facing a very similar problem that you might be able to suggest a solution to very quickly. Do you have a minute? Big fan of the clarity of your answers in the community :slight_smile:

https://community.neo4j.com/t/how-to-aggregate-calculation-of-data-faster/4131/4

Basically, I have a graph with genres and tracks. About 1500 genres, and 7 Million tracks.
Page cache size is 10 Gb, heap is 10 Gb, and Database + Index size is 7.9G.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list
RETURN t.name, count(DISTINCT g) as score ORDER BY score DESC LIMIT 20 

When I run this I get Millions of db hits on expand and filter, and have no idea how to optimize this.

plan

Sorry for the delay, I didn't see your message.
Please post your question in the #neo4j-graph-platform:cypher category so that folks can help you.

Count(distinct g) will be 1 or 2 which is probably not what you want.

You probably want this

  1. don't aggregate on properties if you can avoid it
  2. use the degree to compute your score

but in general you can imagine why you get millions of db-hits if you fetch all tracks of those popular genres.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list 
// some pre-filtering
WITH t, size( (t)-[:HAS_GENRE]->()) as score 
WHERE score > 1
WITH t, score
ORDER BY score DESC LIMIT 20 
RETURN t.name, score