Hi folks, I am building a Graph API where I can get nodes and edges subsequently in the same request.
I am having a bit of trouble trying to optimize this query (please ignore literal values as I replace them with parameters on the application). I am getting out of ideas on how to make them better and use less memory (main issue).
Basically my query get lists of a specific type o node from another one, then orders by insertion_time and then I use subqueries to get some more inner nodes. We have a lot of types of queries, but I will examplify with one of them.
Basically I am using a combination of COMBINE and DISTINCT and then limiting the nodes so I can prepare them for the following query, because it seems better to filter them before going to the sub queries.
Here is the query:
MATCH (root_node:Entity:Business {source_id:"0101-0003-0006-00000001"})-[lvl_0_main_relationship_can_read:CAN_READ]-> (lvl_0:Entity:Membership)
WITH lvl_0,
root_node
ORDER BY lvl_0.insertion_time DESC
WHERE (lvl_0.status IN [ "ACTIVE",
"DISABLED",
"PENDING" ] )
AND (lvl_0.insertion_time <= "2025-02-27T03:37:12.1748779Z")
WITH COLLECT(DISTINCT(lvl_0))[0..20] as lvl_0_collected,
root_node
UNWIND lvl_0_collected AS lvl_0
CALL {
WITH lvl_0,
root_node
OPTIONAL MATCH (root_node)-[:CAN_READ]->(lvl_0)-[lvl_0_child_0_main_relationship_can_read:CAN_READ]-> (lvl_0_child_0:Entity:AssetGroup)
WITH lvl_0_child_0,
lvl_0,
root_node
ORDER BY lvl_0_child_0.insertion_time desc
WHERE (lvl_0_child_0.status IN [ "ACTIVE" ] )
WITH collect(distinct(lvl_0_child_0))[0..20] as lvl_0_child_0_collected,
lvl_0,
root_node
UNWIND lvl_0_child_0_collected as lvl_0_child_0
CALL {
WITH lvl_0_child_0
OPTIONAL MATCH (lvl_0_child_0)-[:BELONGS_TO]-> (lvl_0_child_0_child_0:Tenant)
WITH lvl_0_child_0_child_0,
lvl_0_child_0
WITH collect(distinct(lvl_0_child_0_child_0))[0..1] as lvl_0_child_0_child_0_collected,
lvl_0_child_0
UNWIND lvl_0_child_0_child_0_collected as lvl_0_child_0_child_0
RETURN COLLECT(DISTINCT({node: lvl_0_child_0_child_0,
children:[]})) as lvl_0_child_0_child_0_collected }
RETURN COLLECT(DISTINCT({node: lvl_0_child_0,
children:[{edge: "tenants",
nodes: lvl_0_child_0_child_0_collected}]})) as lvl_0_child_0_collected
}
CALL {
WITH lvl_0
OPTIONAL MATCH (lvl_0)-[:BELONGS_TO]-> (lvl_0_child_1:Tenant)
WITH DISTINCT lvl_0_child_1
LIMIT 1
RETURN COLLECT({node: lvl_0_child_1,
children:[]}) as lvl_0_child_1_collected
}
RETURN COLLECT({node: lvl_0,
children: [{edge: 'edge1',
nodes: lvl_0_child_0_collected},
{edge: 'tenants',
nodes: lvl_0_child_1_collected}] })
What I would like some help is to understand if the issue is the "COLLECT/DISTINCT" that I am using, and if there is some better way to sort the results, filter and then run the subquery. This pattern seems to do the least dbhits I could find, but I think that it is not the most memory-optimized.
My db has a pattern of indexes for {Label} on source_id and for Entity on insertion_time.
But I think the main issue is the too long profile that it generates. I Would appreciate some ideas to make it more optimal, maintaining the response. I am running out of ideias, so I would really appreciate any insight.
If there is any missing info, please let me know
Explain: