What is the time complexity of a query that finds all the children of a node

amarnadh1104 · March 19, 2023, 4:06pm

Hi

I have data in a tree structure where nodes are related to each other with parent/child relation. The application would query the data in 2 different patterns.

Give all the parents of a node.
Give all the children of a node till leaf node.

I am evaluating neo4j for this usecase and would like to know the time complexity that neo4j can offer for each of the above queries and is there any technique(like indexing) that can reduce the default time complexity.

Thanks

vincent.rupp · March 20, 2023, 1:12am

Hello amarnadh!

Neo4j stores relationships in linked lists (of pointers) based on relationship type. Imagine an Actor node has a single ACTED_IN relationship. Starting at this Actor node, cypher will go to that relationship type, and follow the pointer to the other node.

If a different Actor node has 12,345 ACTED_IN relationships, cypher will go to that relationship type, and track every one of those relationships.

So to your first question:
a) O(n) if you are tracking one relationship type back no further.

b) I believe this depends on your specific case. If you have 1 node with 2 children, and each of those children has 2 children, and so on, you are looking at exponential complexity based on the length of the path to reach the leaf node.

That being said, if you had a parent node with children nodes that have children nodes that have .. 10 layers deep, and each node has on average 5 children, you would have just under 10M paths to traverse.

Indexing helps when looking at properties of nodes (especially when finding the starting point in a query), but doesn't help for graph traversal.

A general principle for data models is to create something that prevents graph traversal exploding. If you think about the airline example from the documentation, AirportDay nodes were an efficient way to find flights. If they had used Airport nodes, a query would have to look at every flight for that airport for all time, which is orders of magnitude slower.

So look first at your data model and see if you can come up with something clever or useful that can make your searches focused (fewer relationships from each node, fewer layers, varying relationship types).

Hope that helps, and happy graphing!
V.Rupp

amarnadh1104 · March 20, 2023, 3:15pm

Hi Vincent, thanks for the detailed explanation.
I was reading this blog in which it was described that the response time taken to traverse the ancestors take constant time even with change in depth. if the time complexity is O(n), how would that be possible.

vincent.rupp · March 21, 2023, 11:33am

Interesting article here.

The first thing I thought about was their recursive joins if using flat tables, like start with a record in the plant data table, join the table with its ancestor information, join plant data table to those ancestor plants, join the table with the ancestor information back, and so on. It's interesting that the performance degrades sharply at depth 10. I am curious if that reflects something about their software environment.

Because of that performance breakdown with SQL on Oracle Exadata, the scale makes it very hard to tell what's happening with Neo4j. If they started with a single plant and traced back 15 levels of ancestry and only ended up with 30k total nodes, that's still performance better measured in milliseconds. So I suspect that green line is going up slowly, but it's just impossible to make out.

A line under that image in the article says

We can now process arbitrary-depth trees at what approximates a completely scale-free performance.

Approximating scale-free performance is something I've seen in my work with graphs having nodes counting between 5 and 9 orders of magnitude; most of my efficient graph traversal queries are too fast to really care about improving performance, and when something takes longer than a few seconds, I realize I messed up in writing it.

I guess my final thought is that O() notation is great for understanding an algorithm in general, but performance is so dependent on values of n, i.e. an O(n^2) algorithm may perform better than O(n), for small values of n. So it's still all very situational.

Thanks for passing along the article, and I hope my comments are useful!
-V

Topic		Replies	Views
Why is this query too slow? Cypher	5	560	October 2, 2023
Performance of Graph DB Neo4j Graph Platform	8	523	August 1, 2023
Optimizing simple queries for very large graph DB Cypher performance	12	1399	February 29, 2024
How to make multi level path traversal faster Cypher apoc , performance , configuration , cypher	8	5176	December 4, 2019
Help with children / parents query Neo4j Graph Platform migrated	7	787	December 7, 2022

July Summer Fun!

What is the time complexity of a query that finds all the children of a node

Related topics