Simple count query taking too long to execute

Rogie · February 17, 2020, 5:59pm

I have a db with 20 million nodes and 50 million edges. I'm trying to execute a query of the form:

Match (u:User)-[:MENTIONED]->(:Keyword {keyword:"baseball"}) 
Where u.active="True"
Return Count(u)

I have around 10 million users and only 10 keywords in the db. Approximately 50,000 results are returned

Is it normal for the above query to take around 10 minutes to finish? How can I speed this up?I have constraints on the users forcing them to be unique.

anthapu · February 17, 2020, 6:40pm

Can you please paste the schema here?

Most likely reason is that you might not have an index on User (active) property and Keyword(keyword) property.

If you do not have index on either of them it will cause label scan.

Best way to check is to do

EXPLAIN 
Match (u:User)-[:MENTIONED]->(:Keyword {keyword:"baseball"}) 
Where u.active="True"
Return Count(u)

It will tell you if there are any label scans happening.

rvanbruggen · February 17, 2020, 7:24pm

Have you got indexes on the u.active property and the keyword.keyword property?

intouch_vivek · February 17, 2020, 8:12pm

Along with Rik's recommendation. You can try
Match (u:User:active:'True'})-[:MENTIONED]->(:Keyword {keyword:"baseball"})
Return Count(u)

Rogie · February 17, 2020, 9:44pm

Thank you for the suggestions.

I have just done
CREATE INDEX ON :Keyword(keyword)
and
CREATE INDEX ON :User(active)

I also ran the query as modified by Vivek and this seems to have sped things up.

This is my first usage of indexes and brings up some questions:

How do I properly gauge the time of query executions? There seems to be quite a bit variance if run the same query over and over again. Why is this?
Before I created the indexes the first step in the query was a NodeByLabelScan on :Keyword and now the first step is NodeIndexSeek n :Keyword. What exactly is going on here?
What is the drawback of creating indexes? Why not just create an index on every property? I don't really know what's going on under the hood when I create an index, but I guess they will use up more space? What are some guidelines as to when it is appropriate to create indexes?

Thanks!

anthapu · February 17, 2020, 9:52pm

Hi,
First time when you run, the query planner needs to create an execution plan. So it can take more time. Second time onwards when you execute the query, the query plan is taken from query cache. So it might seem faster. Any variance after that could be based on the system load.

NodeByLabelScan means for the property used in query there is no index. So, it has to look at all the nodes with the given label to find the one you need. NodeIndexSeek means it is using the index to find the node that matches your query.

Indexes take storage and memory. If you create indexes for all the properties, then it might need more memory to hold the indexes themselves, which is needed to do other work like db traversals etc.

Topic		Replies	Views
My count() query is too slow Cypher cypher , counts	6	1450	June 18, 2021
Dreadfully Long Query Times Cypher performance , cypher	5	408	October 2, 2020
Is this query speed normal? Neo4j Graph Platform	5	602	October 23, 2020
Returning count much faster than the nodes Cypher performance , cypher	2	444	April 7, 2020
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	420	December 18, 2020

Get Certified in June!

Simple count query taking too long to execute

Related topics