Hello,
I'm experimenting with Neo4j for the past 4 months and I'm progressively migrating my production environment from MySQL to Neo4j. I think Neo4j is a great database and is very good to model relationships in my data. But for some simpler queries, it is much slower than MySQL. I'm having a specific hard time when I try to query aggregated data, for some queries, it is 40x slower than the corresponding one in MySQL. I'll provide an example.
I have this data model in Neo4j:
.And I want to get the aggregated students effort (questions answered and time spent studying) of all schools from a school manager to display in a dashboard for him. So I make this query:
match (:SchoolManager{schoolManagerId:"3aeb3009-07ca-48ae-b0c8-78b9f8f96bd9"})-[:MANAGER_OF_SCHOOL]->()<-[:CLASS_OF_SCHOOL]-()<-[:STUDENT_OF_CLASS]-()-[:ANSWERED]->(sa:StudentAnswer)
where sa.createdAt >= datetime("2020-01-01") and sa.createdAt <= datetime("2020-11-20")
return {questionsAnswered: count(sa), timeSpent: sum(sa.timeSpent)}
This query takes almost 16K ms. This is the PROFILE for the query:
In MySQL, the corresponding query is something like this:
select count(*) as questionsAnswered, sum(qa.timeSpent) as timeSpent from school_managers.users as sm left join schools.schools_members as sm on sm.user_id = sm.id left join schools.schools_classes as sc on sc.school_id = sm.school_id left join schools.schools_classes_members as scm on scm.school_class_id = sc.id left join quizzes.quizzes_attempts as qa on qa.user_id = scm.user_id where sm.id = 1 and qa.created_at >= "2020-01-01" and qa.created_at <= "2020-11-20";
It's much uglier, but much faster too, it runs in about 600 ms.
I already searched a lot online and know that the slowness is aggravated by the aggregation on a node property in sum(sa.timeSpent)
. If I don't access this node property, the query time decreases by half (which is still slow).
I little more info on my data:
- :SchoolManager ~ 200 nodes
- :School ~ 200 nodes
- :SchoolClass ~ 1K nodes
- :Student ~ 20K nodes
- :StudentAnswer ~ 6M
I have indexes on :SchoolManager.schoolManagerId and :StudentAnswer.createdAt.
I'm using Neo4j Aura instances with 2GB RAM and 1 CPU. My browser and server configs are:
Is there something that I can do to improve the performance of this Neo4j query? Or for this kind of aggregated data on a property MySQL will always be a better choice?