Performance inconsistencies when fetching mutual relationships

davidrapin · November 15, 2018, 3:53pm

Hi there,

My original use-case is fetching all mutual relationships between two groups of nodes.
One of the groups of nodes contains a supernode (i.e. a node with a very high degree).

Let's call n the group of nodes that contains only regular nodes and sn the group of nodes that contains a supernode.

My cypher query is the following:

MATCH (sn)-[r]-(n)
WHERE ID(sn) in [1, 999] // node 999 has high degree (1163)
AND ID(n) in [2, 3] // nodes 1, 2 and 3 have a low degree (3)
RETURN r

I'm currently using Neo4j 3.3.0 to benchmark this query, and I noticed something very odd: the performance of the query depends on the names of the variables I am using for sn, n and r.

More particularly, the performances depend of the alphabetical order of the variables.
I have tried changing other things in the query:

order of the ID(x) in [] statements
switching (sn)-[r]-(n) for (sn)-[r]-(n)

But after lots of testing, the only change that has a consistent impact on the performance of the query is the alphabetical order of the variable names used for sn, n and r.

To test my query, I have the following process:

I rename sn, n and r to a different name (sn => v1, n => v2, r => v3) and profile my query.
then, I change the alphabetical order (e.g. sn => v2, n => v3, r => v1) and profile again
etc.

Here are the results for all possible permutations:

sn => v1, r => v2, n => v3: FAST
sn => v1, n => v2, r => v3: FAST
r => v1, sn => v2, n => v3: SLOW (i.e. at least of DB hit for each relationship of supernode 999)
r => v1, n => v2, sn => v3: FAST
n => v1, r => v2, sn => v3: SLOW (i.e. at least of DB hit for each relationship of supernode 999)
n => v1, sn => v2, r => v3: SLOW (i.e. at least of DB hit for each relationship of supernode 999)

Profiling for each query:

profile match (v1)-[v2]-(v3) where id(v1) in [1, 999] and id(v3) in [2, 3] return v2

plan(1).png181×538 17.2 KB
profile match (v1)-[v3]-(v2) where id(v1) in [1, 999] and id(v2) in [2, 3] return v3
upload limit
profile match (v2)-[v1]-(v3) where id(v2) in [1, 999] and id(v3) in [2, 3] return v1
profile match (v3)-[v1]-(v2) where id(v3) in [1, 99] and id(v2) in [2, 3] return v1
upload limit
profile match (v3)-[v2]-(v1) where id(v3) in [1, 999] and id(v1) in [2, 3] return v2
upload limit
profile match (v2)-[v3]-(v1) where id(v2) in [1, 999] and id(v1) in [2, 3] return v3
upload limit

Having to worry about the alphabetical ordering of variable names in a query makes it quite difficult to try and make queries fast

On the other hand, I might be on the wrong track, and I'm open to suggestions for alternative queries to get the same result in a more fast and stable way.

Best,

David

michael.hunger · November 16, 2018, 3:02pm

It's about the ordering it uses for expand into.

I think that was partially fixed in 3.4

In general it tries to go from the non-dense node side, but if both nodes have > 50 rels, both count as dense.

If you know which group is the one with the supernode then match the other group first
and then the 2nd group via expand.

It would also be good to use a direction + rel-type if possible.

MATCH (n)
WHERE ID(n) in [2, 3] // nodes 1, 2 and 3 have a low degree (3)
MATCH (sn)-[r]-(n)
WHERE ID(sn) in [1, 999] // node 999 has high degree (1163)
RETURN r

davidrapin · November 22, 2018, 3:16pm

Thank you Michael!
We will try this workaround.
Were you able to confirm that this was fixed in Neo4j v3.4?
Best,
David

michael.hunger · November 23, 2018, 12:32am

Best test it yourself with your specific dataset + query. And let us know if it persists.

Topic		Replies	Views
How to speed up ORDER BY Cypher	10	1244	July 2, 2020
Enforcing relationships priority when using var-length matches Cypher cypher	0	172	November 1, 2021
Default ordering of nodes and relationships Neo4j Graph Platform	7	6656	August 11, 2019
If data no change, order of result is fixed? Cypher cypher	1	478	September 3, 2020
Slowness in execution of Cypher query when we use multiple relationship and inner queries Cypher performance , cypher	13	151	April 10, 2025

Performance inconsistencies when fetching mutual relationships

Related topics