Hi Michael,
I am facing a very similar problem that you might be able to suggest a solution to very quickly. Do you have a minute? Big fan of the clarity of your answers in the community
Hi Michael,
I am facing a very similar problem that you might be able to suggest a solution to very quickly. Do you have a minute? Big fan of the clarity of your answers in the community
Basically, I have a graph with genres and tracks. About 1500 genres, and 7 Million tracks.
Page cache size is 10 Gb, heap is 10 Gb, and Database + Index size is 7.9G.
WITH ['rock', 'metal'] AS genres_list
MATCH (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list
RETURN t.name, count(DISTINCT g) as score ORDER BY score DESC LIMIT 20
When I run this I get Millions of db hits on expand and filter, and have no idea how to optimize this.
Sorry for the delay, I didn't see your message.
Please post your question in the #neo4j-graph-platform:cypher category so that folks can help you.
Count(distinct g) will be 1 or 2 which is probably not what you want.
You probably want this
but in general you can imagine why you get millions of db-hits if you fetch all tracks of those popular genres.
WITH ['rock', 'metal'] AS genres_list
MATCH (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list
// some pre-filtering
WITH t, size( (t)-[:HAS_GENRE]->()) as score
WHERE score > 1
WITH t, score
ORDER BY score DESC LIMIT 20
RETURN t.name, score