Responding to this again, because the solution I’d found turns out not to be good enough. The virtual relationships I’d added to the database can accidentally connect and therefore show more data than is actually connected. I’ll expand the slightly obfuscated data structure a bit.
I’ve got two types of node I want to show: the :Origin that was initially selected, and the :Connection that I want to show only when there’s a connection. That connection runs through :Sets that consist of :Elements which in turn connect to :OtherElements that are part of :OtherSets. And that other set connects to the :Connection. (Yes, Set/Element are slightly different from OtherSet/OtherElement; I don’t think it’s relevant, but I’m including it just in case. There are a few hundred thousand OtherElements; far more than Elements.)
It’s possible that not all elements connect to other elements of the same set. Consider this case:
(o:Origin)-->(a:Connection)-->(b:Connection)
They’re only connected through these underlying sets and elements, and it’s the elements from a that connect to b don’t actually connect to o, even if there are other elements from a that do connect to o and not to b. In that case, I don’t want to see b. This is why my solution with the [:VIRTUAL_FLOW] doesn’t work. I’ll have to follow the actual elements.
One thing I’ve done is take this one step at a time. If (b:Connection) connects to (c:Connection), for example, I don’t fetch c until the user decides to expand b. That makes the query a lot simpler, but still very expensive. Imagine if we expand these Connections a few more hops, I still want to know if the next hop still connects to Origin through all these layers of OtherElements. Which means I need to use a [:FLOW*0..] relation, and that apparently explodes and eats up all the memory.
So I’ve been trying a couple of different approaches:
MATCH (o:Origin {id: $origin})→(:Set)<-[:PART_OF_SET]-(:Element)<--(:OtherElement)<-[:FLOW*0..]-(connectingElement:OtherElement)-[:PART_OF_SET]->(:OtherSet)-->(expandingConnection:Connection {id: $connection})
MATCH (connectingElement)<-[:FLOW]-(:OtherElement)-[:PART_OF_SET]->(:OtherSet)-->(newConnections:Connection)
RETURN newConnections
This is super slow and eats up all memory. I can limit the depth to [Flow*0..3] and that will make it much faster, but limit how far I can go, and I don’t want that.
So now I’m trying to do this a step at a time, dropping all elements I don’t need:
MATCH (expandingConnection:OAR {id: $connection})<--(:OtherSet)<-[:PART_OF_SET]-(connectingElement:OtherElement)
WITH expandingConnection, connectingElement
MATCH ANY (:Origin { id: $origin})--(:Set)<-[:PART_OF_SET]-(:Element)<--(:OtherElement)<-[:FLOW*0..5]-(connectingElement)
WITH expandingConnection, connectingElement
MATCH (connectingElement)<-[:FLOW]-(:OtherElement)-[:PART_OF_SET]->(:OtherSet)
-->(newConnections:Connection)
RETURN newConnections
But this is still surprisingly slow. I’m using the specific connectingElements here. I can’t imagine their number explodes beyond the number there actually are. So why is the variable length relationship still so slow? The system knows the begin and end of this variable length relationship. I’m using ANY because I only want to verify that the connection exist (I don’t even need the shortest connection, just any). And still this is slow.
I’ve ran this through the profiler of course, and the ANY shortest path gives 1262 rows (what does that mean here?) and after that 2044 rows to the OtherSet the connectingElement belongs to. Why that many? It shouldn’t even be looking at the OtherSet at this point, because I only needed that to find the connectingElements, which I got before I started on the path.
After that, the profiler goes through a lot of steps I don’t understand, each with 2440 rows, when all we should be doing at this point is just make that one final hop. I clearly don’t understand the profiler output.
So what are ways to improve this? And what does the profiler output really mean?
The whole query took over a minute.
I’ll include the plan output:
Cypher 5
Planner COST
Runtime SLOTTED
Runtime version 2025.07
+-------------------------------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses |
+-------------------------------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +ProduceResults | 0 | origin, newConnection | 0 | 2440 | 41480 | 0 | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Filter | 1 | newConnection:Connection | 0 | 2440 | 2440 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Expand(All) | 2 | (anon_13)-->(newConnection) | 0 | 2440 | 10642 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Filter | 3 | anon_13:OtherSet | 0 | 2440 | 2440 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Expand(All) | 4 | (anon_11)-[:PART_OF_SET]->(anon_13) | 0 | 2440 | 32738 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Filter | 5 | anon_11:OtherElement | 0 | 2440 | 2440 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Expand(All) | 6 | (be)<-[:FLOW]-(anon_11) | 0 | 2440 | 3578 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +NodeHashJoin | 7 | anon_1 | 0 | 159 | 0 | 4728 | 0/0 |
| |\ +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Expand(All) | 8 | (be)-[:PART_OF_SET]->(anon_1) | 11 | 2044 | 20465 | | 0/0 |
| | | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +StatefulShortestPath(All, Trail) | 9 | ANY 1 (u)-[`anon_3`]-(`anon_4`)<-[`anon_5`]-(`anon_6`)<-[`anon_7`]-(`anon_8`)<-[`anon_9`*0..5]-(connectingElement)| 4| 1262 | 53542 | 4207071 | 0/0 |
| | | | | expanding from: u | | | | | |
| | | | | inlined predicates: anon_37:UC_LINKED_TO_IDS | | | | | |
| | | | | anon_38:IDS | | | | | |
| | | | | anon_39:PART_OF_SET | | | | | |
| | | | | anon_40:IDE | | | | | |
| | | | | anon_41:BDL_GDL_MAPPING | | | | | |
| | | | | anon_42:BdlElement | | | | | |
| | | | | anon_44:BDL_FLOW | | | | | |
| | | | | be:BdlElement | | | | | |
| | | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +NodeIndexSeek | 10 | RANGE INDEX o:Origin(id) WHERE id = $autostring_1 | 1 | 1 | 2 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Filter | 11 | anon_1:OtherSet | 0 | 18 | 18 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Expand(All) | 12 | (a1)<--(anon_1) | 0 | 18 | 51 | | 0/0 |
| | +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +NodeIndexSeek | 13 | RANGE INDEX expand:Connection(id) WHERE id = $autostring_0, cache[a1.id] | 1 | 1 | 2 | | 0/0 |
+-------------------------------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
Total database accesses: 169838, total allocated memory: 4211823
(I hope I anonymised it all correctly. In the original query, I do specify all my relationship types; I just didn’t want to invent properly anonymised ones for this.)
Note that the total number of newConnections found here was 6. That is not the issue. But it’s ballooning in the middle.