Simple query performance evaluation

Hi Neo4j community,

I am a working on the evaluation of the performance of a query using the recommnedations dataset included in the Sandbox.

I tried to emulate a multi label by doing the following:

MATCH (n:Actor:Director) SET n:ActorAndDirector RETURN COUNT(n)

So this way I had a label for actors who are directors as well.

Then I performed the simple query

PROFILE MATCH (n:ActorAndDirector) WHERE EXISTS(n.name) AND EXISTS(n.bornIn) RETURN n

which results in 442 nodes.

It goes through 9545 total db hits and I meassured the runtime (in milliseconds) for 10 repeats as follows

298 67 50 111 35 108 34 35 27 30

I know that runtime is not the best meassurement and that the cache needs to warm up so happy to get suggestion on how to use better metrics as the db hits will always be the same and what I found especially confusing is the following.

To see how the same query performs involving a lot more nodes I did the following

UNWIND range(1,44200) AS vertices
CREATE (n:ActorAndDirector {name:vertices, bornIn:vertices})

to create 100 times as many vertices giving them names 1,2,3 and the attributes bornIn as 1,2,3, etc. Not meaningful properties, but just for the sake of performance. Then I executed

PROFILE MATCH (n:ActorAndDirector) WHERE EXISTS(n.name) AND EXISTS(n.bornIn) RETURN n

again, this time with 44642 nodes as result and 407345 total db hits. However, the runtimes of the first ten executions were

28 5 5 9 6 6 5 7 5 6

in milliseconds. Again some warming up and fluctuation in performance, but way faster than before on a much larger (x101) vertex set.

And even weirder was then that after deleting all these new vertices and performing the query on the original 442 vertices the query was slower again, closer to the performance the first time.

I am sure that this might not be the best way to meassure performance, I am trying to get started somewhere, but was just surprised by the results and cannot make any sense of it.

I have looked here in the community forum and on the internet, but could not find anything that could explain this.

Thank you very much for any helpful input, it is much appreciated.

Best,
Philipp

I had tried this in the movie database and here are my results:
-> Initially - 253 total db hits 76ms
-> after adding 44200 - 44229 total db hits 112ms
->deleted all 44200 nodes
-> Finally - 29 total db hits 27ms

Hello Vamshimadineni123,

Thank you so much for the swift reply.

This looks more like what I would have imagined it to look like for me.

Although I am still wondering why it is faster after deleting (cache warmed up?) and especially why it has fewer db hits than the first run.

I am still trying to get my head around this.

Thank you,
Philipp