I was doing some prototyping with Spring Data Neo4j and noticed that the application issued lots of cascading queries while retrieving a very small subgraph via findById() method.
2023-05-23T00:19:59.247+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher : Executing:
MATCH (testNode:`TEST_NODE`)
WHERE testNode.nodeId = $__id__ WITH collect(id(testNode)) AS __sn__
RETURN __sn__
2023-05-23T00:19:59.268+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher : Executing:
MATCH (testNode:`TEST_NODE`)
WHERE testNode.nodeId = $__id__
OPTIONAL MATCH (testNode)-[__sr__:`HAS_CHILD`]->(__srn__:`TEST_NODE`)
WITH collect(id(testNode)) AS __sn__, collect(id(__srn__)) AS __srn__, collect(id(__sr__)) AS __sr__
RETURN __sn__, __srn__, __sr__
2023-05-23T00:19:59.275+03:00 DEBUG 24044 --- [nio-8080-exec-4] org.springframework.data.neo4j.cypher : Executing:
MATCH (testNode:`TEST_NODE`)
WHERE id(testNode) IN $__ids__
OPTIONAL MATCH (testNode)-[__sr__:`HAS_CHILD`]->(__srn__:`TEST_NODE`)
WITH collect(id(testNode)) AS __sn__, collect(id(__srn__)) AS __srn__, collect(id(__sr__)) AS __sr__
RETURN __sn__, __srn__, __sr__
... much more Quiries
In most of the use cases I expect the subgraph to contain 10 - 30 nodes (definitely under 100), so to read them in one go I've added a method with a custom query:
@Query("""
MATCH (root:TEST_NODE {nodeId:$id})
CALL apoc.path.subgraphAll(root, {
relationshipFilter: 'HAS_CHILD'
}) YIELD nodes, relationships
RETURN root, collect(DISTINCT nodes), collect(relationships);
""")
TestNode getSubgraph(String id);
And it works as expected.
But I wanted to know what are the benefits of issuing cascading queries, and under which conditions (rough number of nodes, cardinality, hops from root) it might bear fruit?
I don't have intimate knowledge of the inner workings of SDN, but I can provide my thoughts. SDN uses pure cypher, so it needs an algorithm that can get a root entity and all its descendants (at any level of depth) and reconstruct the nested java object. Iteratively is the approach I would used too, since it adapts to any graph structure.
Your implementation with apoc.path should find all the nodes and relationships of the graph more efficiently because it is a traversal algorithm that executes on the server. That being said, your result is providing the root entity and collections of all the nodes and relationships. SDN will need to sort through these collections to map them back to the java object. This is overhead I don't think exists with the SDN approach, since it knows the relationships of the nodes to the java objects as it iteratively traverse the graph.
Not sure there is speed advantage of either for small graphs. Maybe a test on large graphs would be interesting. For me, the ease of use of the generated repository methods is why I am using it. I would rather implement custom queries for only those capabilities not provided out of the box.
Interesting topic.
SDN will need to sort through these collections to map them back to the java object
Thanks, that's a good point.
Though I guess it wouldn't a heavy cost for 30 nodes.
I'm a bit concerned about network, in situation when an application is not going to be hosted on-prem, you can't assume that every part of it would physically end up in the same data center. For that reason sending over ten requests to the database in order to serve one user-request sounds a bit scary.
I doubt they are independent queries executed in separate transactions. If using the driver, you can use a transaction function. Within it, you can have as many queries as you need. They will be executed in the same transaction and all will be submitted to the server together.
I used the driver extensively for my first microservice because I didn’t want to manage entire graphs from the root entity (like SDN). I needed to edit segments independently. It also is easy to use.
Hey there
- all queries will be executed in one transaction
- that transaction us managed by Springs transactional framework in both the imperative and reactive flows
- No retries are used
- If you want / need some, use Spring Retry or Resillience4j (Why: Because when using the drivers build in mechanism we would basically say "goodbye Spring transaction management, I'm doing my own thing here" which will lead to nice surprises when you want Spring to rollback your transaction or propagate it future
- We do multiple queries for circular mappings, think
(a1:A) -> [:KNOWS] -> (a2:A)
. While you can certainly express those queries for one hope without additional helpers like APOC, these queries blow up with multiple, unbounded hops if you have a bunch of nodes in-between a1 and a2. We need to have them in one result though because we do fully support immutable entities (that is, building a local Java subgraph with immutable objects
I hope that answers your question to some extend.
1 Like
Thank you for sharing your thoughts and experience.
Makes sense, thanks for explanations.