What is the time complexity of a query that finds all the children of a node

Hi

I have data in a tree structure where nodes are related to each other with parent/child relation. The application would query the data in 2 different patterns.

  1. Give all the parents of a node.
  2. Give all the children of a node till leaf node.

I am evaluating neo4j for this usecase and would like to know the time complexity that neo4j can offer for each of the above queries and is there any technique(like indexing) that can reduce the default time complexity.

Thanks

Hello amarnadh!

Neo4j stores relationships in linked lists (of pointers) based on relationship type. Imagine an Actor node has a single ACTED_IN relationship. Starting at this Actor node, cypher will go to that relationship type, and follow the pointer to the other node.

If a different Actor node has 12,345 ACTED_IN relationships, cypher will go to that relationship type, and track every one of those relationships.

So to your first question:
a) O(n) if you are tracking one relationship type back no further.

b) I believe this depends on your specific case. If you have 1 node with 2 children, and each of those children has 2 children, and so on, you are looking at exponential complexity based on the length of the path to reach the leaf node.

That being said, if you had a parent node with children nodes that have children nodes that have .. 10 layers deep, and each node has on average 5 children, you would have just under 10M paths to traverse.

Indexing helps when looking at properties of nodes (especially when finding the starting point in a query), but doesn't help for graph traversal.

A general principle for data models is to create something that prevents graph traversal exploding. If you think about the airline example from the documentation, AirportDay nodes were an efficient way to find flights. If they had used Airport nodes, a query would have to look at every flight for that airport for all time, which is orders of magnitude slower.

So look first at your data model and see if you can come up with something clever or useful that can make your searches focused (fewer relationships from each node, fewer layers, varying relationship types).

Hope that helps, and happy graphing!
V.Rupp

Hi Vincent, thanks for the detailed explanation.
I was reading this blog in which it was described that the response time taken to traverse the ancestors take constant time even with change in depth. if the time complexity is O(n), how would that be possible.

Interesting article here.

The first thing I thought about was their recursive joins if using flat tables, like start with a record in the plant data table, join the table with its ancestor information, join plant data table to those ancestor plants, join the table with the ancestor information back, and so on. It's interesting that the performance degrades sharply at depth 10. I am curious if that reflects something about their software environment.

Because of that performance breakdown with SQL on Oracle Exadata, the scale makes it very hard to tell what's happening with Neo4j. If they started with a single plant and traced back 15 levels of ancestry and only ended up with 30k total nodes, that's still performance better measured in milliseconds. So I suspect that green line is going up slowly, but it's just impossible to make out.

A line under that image in the article says

We can now process arbitrary-depth trees at what approximates a completely scale-free performance.

Approximating scale-free performance is something I've seen in my work with graphs having nodes counting between 5 and 9 orders of magnitude; most of my efficient graph traversal queries are too fast to really care about improving performance, and when something takes longer than a few seconds, I realize I messed up in writing it. :slight_smile:

I guess my final thought is that O() notation is great for understanding an algorithm in general, but performance is so dependent on values of n, i.e. an O(n^2) algorithm may perform better than O(n), for small values of n. So it's still all very situational.

Thanks for passing along the article, and I hope my comments are useful!
-V

1 Like