Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice

Here is the simple query I am running that serves as the base for more complex queries. I am trying to optimize these queries to run faster, but am hitting a wall when even this simple query is running at 500ms. I dont understand how to make this query any more bare bones, but perhaps there is something I am missing or another way I can write it? Here is the query:

explain MATCH (o:Order)-[:CONTAINS_PRODUCT]->(p) 
WHERE o.accountGUID = 'xxxxxxxxxxxxxxx' AND p.id IN ['4753230364775'] AND o.createdAt >= 1576775048 
return o, p LIMIT 10

With results as

Started streaming 10 records after 502 ms and completed after 515 ms.

Or when profiling around 600ms.

I understand it is the p.id check that is taking the longest as it has to sort through all products attached to each order to find orders that contain that product until it hits the limit, but am not sure how to make that any faster. For example running

MATCH (o:Order)-[:CONTAINS_PRODUCT]->(p) 
WHERE o.accountGUID = 'xxxxxxxxxxxxxxx' AND o.createdAt >= 1576775048 
return o, p LIMIT 10

Completes in less than 1ms


Any advice would be appreciated.

One clever thing you can do, is move LIMIT before the RETURN statement. Otherwise, the query creates a huge list, only to trim it to 10 entries. (This isn't obvious.)

See: 5 Secrets to More Effective Neo4j 2.2 Query Tuning - Neo4j Graph Data Platform

1 Like

What label is on p? :Product? If so, is there a unique constraint on :Product(id)? If so we may be able to optimize this a bit:

MATCH (o:Order), (p:Product)
WHERE o.accountGUID = 'xxxxxxxxxxxxxxx' AND p.id IN ['4753230364775'] AND o.createdAt >= 1576775048 
MATCH (o:Order)-[:CONTAINS_PRODUCT]->(p1) 
WHERE p1 = p
RETURN o, p
LIMIT 10

This is assuming :Product(id) is unique. It should pre-match the nodes, then perform the expansion, and instead of having to do property checking on p to see if it's the right product, it will compare the nodes themselves, which is more efficient (it's basically comparing the internal graph ids, which should be fast).

You could also try adding a composite index on :Order(accountGUID, createdAt).

1 Like

Thank you Andrew that is very helpful! Yes (p) is :Product and has a uniqueness constraint on id.

Using your advice and the other user who commented on limiting WITH statement I was able to significantly speed up my queries.

However I am facing another confusing issue dealing with indexed searching i was hoping you could help with?

My queries for :Session nodes are taking extremely long, like 25 seconds or more, despite the fact they they are indexed. When I use the explain or Profile planners I can see that these are taking so long because they are scanning by label rather than using the indexes I have. Could you possibly explain why this is happening?

Here is the query:

MATCH (n:Session) where n.accountGUID = 'xxxxxxxxxxxxxxx' 
RETURN n LIMIT 25

It takes 25+ seconds
I have attached a picture of the EXPLAIN plan.

Using :schema in the cypher terminal i can see that is is Indexed:

ON :Session(accountGUID, profileGUID, sessionKey) ONLINE (for uniqueness constraint)

And here are the constaints:

ON ( session:Session ) ASSERT (session.accountGUID, session.profileGUID, session.sessionKey) IS NODE KEY

Does this mean I have to provide all 3, accountGUID, profileGUID, and sessionKey? All my other nodes are Indexed with accountGUID and ID, but i can use one or the other to search by index, so im wondering what the case is here.

Very much appreciate the help!

You're correct, when there is a composite index (on its own or because of a node key constraint) all indexed properties must be present for the index to be leveraged. You would have to create an index or unique constraint on :Session(accountGUID) in order to use the index when only accountGUID is used for lookup.

1 Like

Thanks so much! Appreciate the help!