Issue with Cypher ' IN ' filter

performance
cypher
where

(Phanindhar Bodla) #1

I have 70k nodes with label User

Match (user:User) return collect(user) // in under 700ms returns all 70 k users

Match (user:User), (india:Country{id:12}) where user.id in india.user_ids_list return collect(user) // taking 25 seconds to return 57k users in India , india.user_ids_list has ids of all Indian users

I was expecting a better runtime for second Query. Since it tries to fetches only a part of total users. Can someone please explain why this is happening and suggest on what can I do to tune this query.


(Michael Hunger) #2

user_ids_list this is a horrible long list that is not good to store in Neo4j

You should use a relationship between the two.
And if you can't the at least store it the other way round store country_id in :User not users in Country.
You should have an index on :User(id) and :User(country_id).

MATCH (user:User) where user.country_id = $country_id
RETURN user;

You should also really use collect if you have other aggregations, otherwise just use return user to make it stream and use less memory.

This is what you could do but it's not recommended.

Match (india:Country{id:12})
UNWIND india.user_ids_list as id 
MATCH (user:User) where user.id = id
RETURN user;

You can look at the cost of each operation with PROFILE then you see what's expensive.