Assuming author_id
is unique on :Author nodes, do your aggregation on the node itself, not its property:
...
WITH a, SUM(b.read_count) AS total_read_count
...
After the aggregation you can project out a.author_id AS author_id
. That should help as you'll only perform property access once author is distinct in your query.
As for the approach of filtering out books with a read count greater than 500, I don't think that's the way to go, or rather if you want to do this kind of filtering, it's incomplete...although this does get us authors who have published books with a read count less than 500, it only does the aggregation over these books, so if an author has 2 books, one with a read count of 1000, and one with a read count of 200, its book with the higher read count will be thrown out, so it's as if it only has the 200 read count book so it will be incorrectly returned in the query.
The approach that I think was supposed to happen here was to match to authors with books >= 500 read count, and filter out the authors, not the books, so that when we continue the query to find authors of books we do not count the authors who have already been filtered out:
Match (b:Book)<-[:Published]-(a:Author)
where b.read_count >= 500
with collect(DISTINCT a) as filteredOutAuthors
match (b)<-[:Published]-(a:Author)
WHERE NOT a IN filteredOutAuthors
WITH a, SUM(b.read_count) AS total_read_count
WHERE total_read_count < 500
RETURN .author_id AS author_id, total_read_count
That said, the resulting list of filteredOutAuthors may be too large for the heap, and may be inefficient for filtering, so this may not be faster. Give it a try though and see how it fares.
There's an eager aggregation here for the count as well.
You might see if using a slightly different approach using pattern comprehensions and APOC functions (for summing values across a list) might perform any better, as this should allow streaming of the query, as it avoids the eager aggregation from sum():
MATCH (a:Author)
WITH a, apoc.coll.sum([(a)-[:Published]->(b:Book) | b.read_count]) as total_read_count
WHERE total_read_count < 500
RETURN a.author_id AS author_id, total_read_count