I have a question regarding the optimization of the following query.
MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User)
WHERE child.achievedRank < 1000
AND child.isMvp = True
AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]
RETURN child.userId
The problem is with the following line:
AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]
I want to filter with a list of users. But when i put a lot of userIds in the IN clause it is extremely slow.
In my case i need to use a filter of about 10000 users but it takes like 15-20 seconds.
I would really appreciate any suggestion that would help me optimize this query.
The Neo4j used is version 3.4.0 community edition
Here is the Execution Plan when trying to filter with about 10k userIds
I am fairly new to Neo4J and can't claim I know the internals, but what you are doing here is a lot of string comparisons which is not very efficient. If you can hash the user ids in advance (i.e. the list of ids in the WHERE ... IN... construction does not change every time) then it would be a constant time lookup for each uid.
Although creating a composite index would help a little, it will not solve the bigger problem, which is filtering by 10k users. Do you have any suggestions about that?
Thanks for suggesting that, Ukirik! If i understood properly, your solution would work if the list does not change. The list of usersId changes, it is not the same every time. Correct me if I am wrong.
PROFILE
WITH ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"] AS childList
MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User{isMvp:True)
WITH child, childList
WHERE child.achievedRank < 1000
AND child.userId IN childList
RETURN child.userId
I changed some things by removing the filter for isMvp and getting all childs that are sponsored - 2000000 levels downstairs so i would receive more result to see exactly is it faster or not.
MATCH (u:User { userId:"headUserId1232131"})-[:SPONSORED*..2000000]->(child:User)
WHERE child.achievedRank < 1000
AND child.userId IN ["id1","id2" ... "id1000"] //Here we have 1000 userIds
RETURN child.userId
PROFILE
WITH ["id1","id2"..."id10000"] as childList // Here we have the same 1000 users from the upper query
MATCH (u:User { userId:"headUserId142111"})-[:SPONSORED*..2000000]->(child:User)
WITH child, childList
WHERE child.achievedRank < 1000
AND child.userId IN childList
RETURN child.userId
I don't see any improvement with the speed of the query, i`d say is pretty much the same if not even a little slower, but thanks for suggestion. Do you have any other idea?