Resources to understand implications of graph structure on query perf?

Since graph dbs use pointers between nodes and relationships instead of join tables, your performance for queries involving multiple traversals is proportional to the data that you traverse and filter, and not the total data (such as the number of nodes of a particular label or the number of relationships of a particular type).

Query tuning and performance is then all about ensuring you touch the smallest portion of the graph in order order to get the desired results, as well as to optimize queries to minimize cardinality and avoid applying redundant operations multiplicatively.

This usually consists of:

  1. Ensuring you have an index or unique constraint for starting nodes in your graph so your initial lookups are quick. The schema sections in the docs should have you covered here.

  2. Ensuring you have a model that favors narrow selection of relevant relationships vs more generic relationships and filtering of the node on the other end (or filtering based on properties of the relationship or other node). While often there's no way around this except to expand and filter, if you can use specific relationships that are doing the filtering for you based just on the types, you can often see a considerable performance boost.
    Max De Marzi is one of our specialists for modeling and tuning for performance. He talks a bit about this kind of modeling in this blog entry, but you'll probably want to scan through all of his blog posts to look for modeling topics that match up to your needs.

  3. Understanding how Cypher works (especially when it comes to cardinality in Cypher queries) so you can avoid pitfalls, make modeling decisions that pair well with expected Cypher queries, and profile to troubleshoot/optimize queries. I highly recommending reading through our knowledge base article on cardinality in Cypher queries, and going through other Cypher articles in the knowledge base.

While graph dbs have great power in that you have lots of flexibility in how you connect and model your data, this flexibility means there are many more options for how to model your data, and some modeling decisions will be less straight forward than others. With experience you can begin to get a feel for modeling smells, which should push you toward refactoring your model, and in some cases it will require query profiling to reveal weaknesses in your model.

Some other resources to help you out here:

Links in our top-level post in our Modeling section of the community site
The Modeling section of the community site
Modeling Designs blog post (links up with Max De Marzi's blog posts)
Data modeling pitfalls

2 Likes