Slow query response time - simple friends of friends cypher query


(Mark) #1

I am using NEO4j 3.5 to store and query relationships between people. I have nodes with the label "User" and relationships with the label of "Friends". I am able to get the friends of friends, but the query is taking too long. It currently returns in 4sec to 6sec. This is not a high transactional neo4j database and the server has plenty of CPU and memory available. The loads on the server are under 3 and there are 8 cores. This is running on an AWS EC2 instance. There are roughly 250,000 nodes in the database and the total database size is under 750mb.

Here is the query that I am currently using:

MATCH (user:User {user_id:1145})-[:FRIENDS*3]->(fof:User)
WHERE NOT (user:User)-[:FRIENDS]->(fof:User)
RETURN count(distinct fof.user_id)

This cypher query returns a count of 69,704, which is correct. I am running the query in the NEO4j browser.

What optimizations can be made either to the cypher query or the NEO4j database engine to return the results faster?

Profile Plan

+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| Operator              | Estimated Rows | Rows   | DB Hits | Cache H/M | Identifiers                 | Ordered by       | Other                                      |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +ProduceResults       |              0 |      1 |       0 |       0/0 | count(distinct fof.user_id) |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +EagerAggregation     |              0 |      1 |  326421 |       0/0 | count(distinct fof.user_id) |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +AntiSemiApply        |              0 | 256717 |       0 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0                                        |
| |\                    +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Expand(Into)       |              0 |      0 | 8006149 |       0/0 |   REL80, fof, user          |                  | 0.0; (user)-[  REL80:FRIENDS]->(fof)       |
| | |                   +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Filter             |              1 | 260120 |  520240 |       0/0 | fof, user                   |                  | 0.0; fof:User                              |
| | |                   +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Argument           |              1 | 260120 |       0 |       0/0 | fof, user                   |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +Filter               |              0 | 260120 |  260120 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0; fof:User                              |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +VarLengthExpand(All) |              0 | 260120 |  267999 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0; (user)-[anon[33]:FRIENDS*3..3]->(fof) |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +NodeIndexSeek        |              1 |      1 |       3 |       0/0 | user                        | user.user_id ASC | 0.0; :User(user_id)                        |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+

(Andrew Bowman) #2

Note that since you're using :FRIENDS*3 you're not getting friends-of-friends, you're getting friends-of-friends-of-friends, which is going to be a more expensive query. For the rest of this I'll assume that you do really want friends-of-friends-of-friends.

In this case it would be helpful to match to friends first, collect them (for later use), unwind back to rows, and continue the match to friends of friends of friends.

At that point, you can get the distinct set of friends of friends before continuing, which should cut down on redundant operations. You can use a NOT IN check to ensure none of the friends of friends are in the friends node list you collected earlier:

MATCH (user:User {user_id:1145})
WITH user, [(user)-[:FRIENDS]->(friend) | friend] as friends
UNWIND friends as friend
MATCH path = (friend)-[:FRIENDS*2]->(fofof)
WHERE none(node in nodes(path) WHERE node = user) and NOT fofof in friends
RETURN count(DISTINCT fofof)

Note also that if we want distinct counts, there's no need to project out and count unique properties of those nodes, just count the distinct nodes themselves.


(Michael Hunger) #3

There is a better version of var-length expand that works on distinct in-between nodes.

PROFILE
MATCH (user:User {user_id:1145})-[:FRIENDS*3]->(fof)
WITH distinct user, fof
WHERE NOT (user)-[:FRIENDS]->(fof)
RETURN count(distinct fof.user_id)

or this

explain MATCH (user:User {user_id:1145})-[:FRIENDS]->(f)-[:FRIENDS]->(f2)
WITH distinct user, f2
MATCH (f2)-[:FRIENDS]->(fof)
WHERE NOT (user)-[:FRIENDS]->(fof)
RETURN count(distinct fof.user_id)

Unfortunately Triadic Completion only works up to level 2

https://graphaware.com/neo4j/2015/10/20/faster-recommendation-graphaware-reco-neo4j-triadic.html


(Mark) #4

Thanks for the different query examples. This helped me tune the query and get the response time that was needed!