Cypher query extremely slow

Hi everyone. We have the following cypher query with a few filters to suggest users to follow to the current user.

MATCH (user:User)
WHERE user.id = $userId
MATCH (suggestedUser:User)
WHERE suggestedUser.id <> $userId
MATCH (suggestedUser)
WHERE NOT (user)-[:FOLLOWS]->(suggestedUser)
OPTIONAL MATCH (suggestedUser)-[r:BELONGS_TO]->(o:Organisation)
WHERE o.id = $orgId
WITH DISTINCT suggestedUser
OPTIONAL MATCH (suggestedUser)-[r*1..2] -> (user)
WITH suggestedUser, count(r) as c
RETURN suggestedUser.id, c
ORDER BY c desc
SKIP 0
LIMIT 20;

The idea is to suggest users relevant to the current user and rank them by the number of relationships they have to the current user up to 2 hops. On a graph of about 50000 users, even while applying a limit of 20 users the runtime of this query is about 3 secs.

We have hosted it on neo4j aura with the following instance configuration:
Memory: 1GB
Storage: 2GB
CPU: 1
Neo4j version: 5**
Region: UK (europe-west2), GCP

I don't understand why a simple fetch of 20 users matching the above conditions will have this high a latency? Or is there something non peformant with the query ?

@gunjan.tank.01

  1. what version of Neo4j?

  2. Any indexes in place ?

  3. Can you provide the results of a EXPLAIN on said query.

There are a couple points to note about your query:

  • what is ‘%s’ in your second match? Assuming it is Users, you are basically searching for all Users except the one user you have. Then you match on these users again to add a constraint. This could be done in one match.

  • you have a search to get each suggested user’s organization, but you don’t use this information.

The following should work. It will find all users related to a given user that does not already follow the user, returning them by number of relationships they have with the given user.

MATCH (user:User{id: $userId})
MATCH (suggestedUser:User)-[*1..2] -> (user)
WHERE NOT EXISTS((user)-[:FOLLOWS]->(suggestedUser))
WITH user, suggestedUser, count(*) as rank
RETURN suggestedUser.id, rank
ORDER BY rank desc
SKIP 0
LIMIT 20;

@glilienfield Thank you for the response. Its users. Sorry about that. We do use the organisation info. We have a bunch of filters where we want to:

  1. Suggest users to the current user belonging on same org if the filter is selected.
  2. Suggest users to the current user he/she is already following.

Thats why, as per the filter set using %s which can be either empty making it a compulsory criteria or an optional match using the keyword OPTIONAL and use the same query for both the cases.

Is there something that still can be optimised ? We are new to neo4j and would like to use it for a lot of use cases in our product. Therefore, any help will be greatly appreciated.

@dana_canzano Thank you for the response.

  1. We use the version 5
  2. We have two Full-text search indexes for search. I thought lookup indexes are present by default and we dont have to explicitly create them. The id of the user node is the unique identifier.
  3. The results of running explain:

@gunjan.tank.01

thought lookup indexes are present by default and we dont have to explicitly create them

not sure where this was determined but you need to explicitly create indexes. Yes there is a prebuilt label index on every database. i.e. if you database has 100 million nodes and 20k are named :User then a match (n:User) .... will only look at the 20k nodes with :User label. But if you have match (n:User) where n.id=$userId ....... and you have no index on :User(id) we will need to examine all 20k :User nodes. However if you have an index in :User(id) then said query will utilize the index and be significantly faster. But this is no different than a traditional RDBMS... i.e. a SELECT * FROM Users where Users.id=$userId return User.name is going to be a lot faster when there in an index on Users(id)

so... I might suggest creating indexes in

:User(id)
:Organisation(id)

see Indexes for search performance - Cypher Manual

Cyphyer is not really a scripting language. It is difficult to have condition workflows. I suggest separate optimized queries for each scenario, unless they are very similar and can be merged efficiently.

You should add an index to the Id property for the User label, so looking up your given user is fast. The rest of the query is pattern patching.

How do you want to use the organization, so I can add it?

@dana_canzano I understand. Did you mean creating a range index on user and org id ?
Do we have to make explicit changes to cypher queries to utilise these indexes ?

CREATE INDEX range_index_user_id FOR (n:User) ON (n.id)

@gunjan.tank.01

yes create an index on :User(id) and :OIrganisation(id) ... i.e. 2 indexes

and the Cypher p[lanner will use an Index, if it is defined, and use it automatically https://neo4j.com/docs/cypher-manual/current/execution-plans/operators/#query-plan-node-index-scan.
This is no different than most traditional RDBMS.

Right. Thank you !
I will add the indexes. We need to do a match for users for users belonging to a particular org.

So, you want the suggested students to also belong to a certain organization?