Simple query performance evaluation

pska752 · July 23, 2021, 4:59am

Hi Neo4j community,

I am a working on the evaluation of the performance of a query using the recommnedations dataset included in the Sandbox.

I tried to emulate a multi label by doing the following:

MATCH (n:Actor:Director) SET n:ActorAndDirector RETURN COUNT(n)

So this way I had a label for actors who are directors as well.

Then I performed the simple query

PROFILE MATCH (n:ActorAndDirector) WHERE EXISTS(n.name) AND EXISTS(n.bornIn) RETURN n

which results in 442 nodes.

It goes through 9545 total db hits and I meassured the runtime (in milliseconds) for 10 repeats as follows

298 67 50 111 35 108 34 35 27 30

I know that runtime is not the best meassurement and that the cache needs to warm up so happy to get suggestion on how to use better metrics as the db hits will always be the same and what I found especially confusing is the following.

To see how the same query performs involving a lot more nodes I did the following

UNWIND range(1,44200) AS vertices
CREATE (n:ActorAndDirector {name:vertices, bornIn:vertices})

to create 100 times as many vertices giving them names 1,2,3 and the attributes bornIn as 1,2,3, etc. Not meaningful properties, but just for the sake of performance. Then I executed

PROFILE MATCH (n:ActorAndDirector) WHERE EXISTS(n.name) AND EXISTS(n.bornIn) RETURN n

again, this time with 44642 nodes as result and 407345 total db hits. However, the runtimes of the first ten executions were

28 5 5 9 6 6 5 7 5 6

in milliseconds. Again some warming up and fluctuation in performance, but way faster than before on a much larger (x101) vertex set.

And even weirder was then that after deleting all these new vertices and performing the query on the original 442 vertices the query was slower again, closer to the performance the first time.

I am sure that this might not be the best way to meassure performance, I am trying to get started somewhere, but was just surprised by the results and cannot make any sense of it.

I have looked here in the community forum and on the internet, but could not find anything that could explain this.

Thank you very much for any helpful input, it is much appreciated.

Best,
Philipp

vamshimadineni123 · July 23, 2021, 12:49pm

I had tried this in the movie database and here are my results:
-> Initially - 253 total db hits 76ms
-> after adding 44200 - 44229 total db hits 112ms
->deleted all 44200 nodes
-> Finally - 29 total db hits 27ms

pska752 · July 25, 2021, 11:05pm

Hello Vamshimadineni123,

Thank you so much for the swift reply.

This looks more like what I would have imagined it to look like for me.

Although I am still wondering why it is faster after deleting (cache warmed up?) and especially why it has fewer db hits than the first run.

I am still trying to get my head around this.

Thank you,
Philipp

Topic		Replies	Views
Optimising query performance with a relatively simple match Cypher performance	3	656	July 3, 2020
Query Performance for Label Matching Cypher	3	291	November 25, 2021
Multiple matches performance drop Neo4j Graph Platform	4	794	January 29, 2019
Querying relationships slow performance Cypher performance , cypher , relationship	4	2046	October 15, 2020
Performance issues with simple match on small graph (lineage analysis, cypher doesn't terminate) Neo4j Graph Platform performance , migrated	0	169	November 21, 2022

Get Certified in June!

Simple query performance evaluation

Related topics