I stumbled upon very big execution time for searching for friends of the 5th degree of a person. With 20k people in the database, the query takes approx. 30 seconds to execute. I imagined that this type of query is what Neo4j would be good at. The plan states that there is a cartesian product being made - should that be the optimal way? Or has something gone wrong?
Comparing to Postgres, Neo4j performs poorly - 30 seconds to 0.1 of a second (both databases are filled with the same data).
Neo4j version: 5.7.0
Query:
MATCH (p1: Person {id: 6})-[:FOLLOWS*5]->(followed:Person)
RETURN DISTINCT followed.id as id
Explain:
Planner COST
Runtime SLOTTED
Runtime version 5.7
+------------------------+----+-------------------------------------------------+----------------+
| Operator | Id | Details | Estimated Rows |
+------------------------+----+-------------------------------------------------+----------------+
| +ProduceResults | 0 | id | 1288238 |
| | +----+-------------------------------------------------+----------------+
| +Distinct | 1 | cache[followed.id] AS id | 1288238 |
| | +----+-------------------------------------------------+----------------+
| +VarLengthExpand(Into) | 2 | (p1)-[anon_0:FOLLOWS*5..5]->(followed) | 1356040 |
| | +----+-------------------------------------------------+----------------+
| +CartesianProduct | 3 | | 20100 |
| |\ +----+-------------------------------------------------+----------------+
| | +CacheProperties | 4 | cache[followed.id] | 20100 |
| | | +----+-------------------------------------------------+----------------+
| | +NodeByLabelScan | 5 | followed:Person | 20100 |
| | +----+-------------------------------------------------+----------------+
| +NodeIndexSeek | 6 | RANGE INDEX p1:Person(id) WHERE id = $autoint_0 | 1 |
+------------------------+----+-------------------------------------------------+----------------+
Thanks!