SDN - Cascading Queries while retrieving a Subgraph

AlexanderIv · May 22, 2023, 9:47pm

I was doing some prototyping with Spring Data Neo4j and noticed that the application issued lots of cascading queries while retrieving a very small subgraph via findById() method.

2023-05-23T00:19:59.247+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher    : Executing:
MATCH (testNode:`TEST_NODE`) 
WHERE testNode.nodeId = $__id__ WITH collect(id(testNode)) AS __sn__ 
RETURN __sn__

2023-05-23T00:19:59.268+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher    : Executing:
MATCH (testNode:`TEST_NODE`) 
WHERE testNode.nodeId = $__id__ 
OPTIONAL MATCH (testNode)-[__sr__:`HAS_CHILD`]->(__srn__:`TEST_NODE`) 
WITH collect(id(testNode)) AS __sn__, collect(id(__srn__)) AS __srn__, collect(id(__sr__)) AS __sr__ 
RETURN __sn__, __srn__, __sr__

2023-05-23T00:19:59.275+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher    : Executing:
MATCH (testNode:`TEST_NODE`) 
WHERE id(testNode) IN $__ids__ 
OPTIONAL MATCH (testNode)-[__sr__:`HAS_CHILD`]->(__srn__:`TEST_NODE`) 
WITH collect(id(testNode)) AS __sn__, collect(id(__srn__)) AS __srn__, collect(id(__sr__)) AS __sr__ 
RETURN __sn__, __srn__, __sr__

... much more Quiries

In most of the use cases I expect the subgraph to contain 10 - 30 nodes (definitely under 100), so to read them in one go I've added a method with a custom query:

    @Query("""
        MATCH (root:TEST_NODE {nodeId:$id})
        CALL apoc.path.subgraphAll(root, {
            relationshipFilter: 'HAS_CHILD'
        }) YIELD nodes, relationships
        RETURN root, collect(DISTINCT nodes), collect(relationships);
        """)
    TestNode getSubgraph(String id);

And it works as expected.

But I wanted to know what are the benefits of issuing cascading queries, and under which conditions (rough number of nodes, cardinality, hops from root) it might bear fruit?

glilienfield · May 22, 2023, 10:32pm

I don't have intimate knowledge of the inner workings of SDN, but I can provide my thoughts. SDN uses pure cypher, so it needs an algorithm that can get a root entity and all its descendants (at any level of depth) and reconstruct the nested java object. Iteratively is the approach I would used too, since it adapts to any graph structure.

Your implementation with apoc.path should find all the nodes and relationships of the graph more efficiently because it is a traversal algorithm that executes on the server. That being said, your result is providing the root entity and collections of all the nodes and relationships. SDN will need to sort through these collections to map them back to the java object. This is overhead I don't think exists with the SDN approach, since it knows the relationships of the nodes to the java objects as it iteratively traverse the graph.

Not sure there is speed advantage of either for small graphs. Maybe a test on large graphs would be interesting. For me, the ease of use of the generated repository methods is why I am using it. I would rather implement custom queries for only those capabilities not provided out of the box.

Interesting topic.

AlexanderIv · May 23, 2023, 12:37pm

SDN will need to sort through these collections to map them back to the java object

Thanks, that's a good point.

Though I guess it wouldn't a heavy cost for 30 nodes.

I'm a bit concerned about network, in situation when an application is not going to be hosted on-prem, you can't assume that every part of it would physically end up in the same data center. For that reason sending over ten requests to the database in order to serve one user-request sounds a bit scary.

glilienfield · May 23, 2023, 12:48pm

I doubt they are independent queries executed in separate transactions. If using the driver, you can use a transaction function. Within it, you can have as many queries as you need. They will be executed in the same transaction and all will be submitted to the server together.

I used the driver extensively for my first microservice because I didn’t want to manage entire graphs from the root entity (like SDN). I needed to edit segments independently. It also is easy to use.

michael_simons1 · May 23, 2023, 1:19pm

Hey there

all queries will be executed in one transaction
that transaction us managed by Springs transactional framework in both the imperative and reactive flows
No retries are used
If you want / need some, use Spring Retry or Resillience4j (Why: Because when using the drivers build in mechanism we would basically say "goodbye Spring transaction management, I'm doing my own thing here" which will lead to nice surprises when you want Spring to rollback your transaction or propagate it future
We do multiple queries for circular mappings, think (a1:A) -> [:KNOWS] -> (a2:A). While you can certainly express those queries for one hope without additional helpers like APOC, these queries blow up with multiple, unbounded hops if you have a bunch of nodes in-between a1 and a2. We need to have them in one result though because we do fully support immutable entities (that is, building a local Java subgraph with immutable objects

I hope that answers your question to some extend.

AlexanderIv · May 23, 2023, 4:04pm

Thank you for sharing your thoughts and experience.

AlexanderIv · May 23, 2023, 4:06pm

Makes sense, thanks for explanations.

Topic		Replies	Views
SDN findAll performance and populating entities at depth > 1 Spring Data Neo4j & Neo4j-OGM sdn	12	2246	May 16, 2022
findById seems to be querying the whole database in one go Spring Data Neo4j & Neo4j-OGM	9	870	April 7, 2021
Spring Data Neo4j loops if there is a cycle Cypher	1	249	April 21, 2022
Easiest way to query two databases (or instances) in same SDN 6 project Spring Data Neo4j & Neo4j-OGM migrated	1	238	January 26, 2023
Delete All with SDN Spring Data Neo4j & Neo4j-OGM	3	365	May 23, 2023

SDN - Cascading Queries while retrieving a Subgraph

Related topics