Introducing the Neo4j Database Analyzer

(Kees Vegter) #1

A tool to get a quick understanding of the data structures in your Neo4j Database.

Over the years working with Neo4j, I was creating small tools to help me to understand what kind of data a Neo4j database contains. I wanted to know what the label and relationships counts were in the database and which properties are there to give a good estimate, how a database will grow over time.

With the availability of Neo4j Desktop, I created a Neo4j Desktop App based on the small tools I used before called Analyze Database. While doing this I added two more tools. Live Count which counts all the Nodes per Label and Relationships per Relationship Type every n seconds and plot this in a timeline chart and Model which gives you the ability to explore the database schema. This is especially useful when the normal “call db.schema()” gives you a hairball structure in the Neo4j Browser. I created this Neo4j Desktop App with the valuable help from Michael Hunger.

Install the Neo4j Database Analyzer as follows in the Neo4j Desktop
(1.1.10 or later) is easy:

Open the “Graph Applications”-sidebar, and paste the url:
https://neo.jfrog.io/neo/api/npm/npm/neo4j-db-analyzer
into the “Install Graph Application” field and press “Install”

Select a Project and press ‘+ Add Application’ in the applications listChoose here the “Neo4j Db Analyzer” to add it to your Project

In the following sections a more in depth explanation is given for each tool.

Analyze Database

When you press “Analyze Database” the database structures will be counted. While the tool is analyzing the database you will see in the “Summary” tab a listing of all the steps the tool is doing to analyze the database. When finished this listing is moved to the “Log” tab and the results of the counts are displayed in the “Summary” tab.

Default Counts

With the default settings this tool will execute counts while using the count store (database statistics), which means that the queries are not expensive for the database. The following counts are executed while using the count store:

  • Nodes
  • Relationships
  • Labels
  • Relationship Types
  • Outgoing relationship types per label
  • Incoming relationship types per label

It is best to start with the default settings.

Analyzing Properties and Label Combinations

When you want to analyze Node or Relationship properties or Label Combinations the count store cannot be used which means that the query load on the database will be more involved. Therefore you have to specify in the Label Filter and Relationship Type Filter which Labels and Relationship Types you want to analyze.

Be careful with very big databases to analyze properties or label combinations, don’t do it on a database powering production workloads, rather on a backup or a read-replica/follower.

The following information is gathered when you analyze Properties and Label Combinations:

  • Label Combinations
    Label combinations will be found and counted.
  • Label Properties
    Label property combinations and counts. And a list of all the different properties found and their data types. It is also shows if a property has an index.
  • Relationship Type Properties
    Relationship Type Property Combinations and counts. And a list of all the relationship properties found and their data types

Sampling

When the amount of nodes, or the amount of relationships is above a configurable threshold, sampling is used to limit the load on the server. Press the “Sampling” button to edit the threshold values.

Note that when sampling is used, the found properties and label combinations are an estimate.

Example load

To get an impression how the tool works I analyzed a database with
46M Nodes, 61M Relationships, 101 Labels, 124 Relationship Types and 18 Label Combinations (created with my faker-based dataset generator).

The default count, without property analysis and checking on Label combinations, took 2 seconds. The analysis with all Label/RelationshipType Properties and Label Combinations took ~15 minutes.

Label Details

In this tab you can see all the details of a Label by clicking on the ‘Label’ row. That row also contains the count of the Nodes with this specific label.

Label Combinations

In this tab a tile is shown per label combination with the count of it.

Relationship Details

In this tab a bar is shown for every Relationship Type with the Relationship count. Only when Relationship properties are analyzed then the detail section will be shown when you click on the bar. In the details we see the property list and the possible property combinations.

Indexes, Constraints and Log tabs

For convenience the Indexes and Constraints of the database are listed here. The Log tab contains the logging of the analysis which is shown in the Summary tab during analyzing.

Live Count

In this tool every 10 seconds (default) the nodes per selected label and relationships per selected relationship type are counted. Note that these queries are using the database statistics so these queries are very ‘light’ for the database. By default the first Label of the label list and the first Relationship Type of the Relationship Type list is selected. While counting you can add or remove labels or relationship types from the ‘count’. You will see these changes in the ‘next’ count. This tool counts structures in the database ‘Live’ however if you want to monitor the database you can use the Neo4j Desktop App Halin.

Model

This tab makes it possible to ‘walk’ over your database model even when there are a lot of Labels and Relationship Types. The visualization only contains data, when the Database has been analyzed.

The database model starts with an empty canvas and you can start the exploration of the Model via selecting a Label via the “Labels Filter” or by pressing “Show All”. When the model complexity is too high, you will get a warning that showing the complete model will probably fail when clicking on “Show All”. In that case you can better use the “Labels Filter” to start your model exploration. The complexity of the model is calculated ad follows:

ModelComplexity = (Label Count + RelationshipType Count) * (Relationship Count / Node Count)
When the ModelComplexity is above 400, then "Show All" will give a warning.

Show All

For smaller schema’s this option will the fastest way to get a quick overview of the database model.

When a Node is selected it becomes blue, and the properties of the Node will be shown on the Right. This will contain the Node Count of the Label and the Incoming and Outgoing Relationship Types with their Relationship counts. When the properties of this Label are analyzed you will see here also a property list with property types.

Context Menu

With the context menu on a selected “Label” Node you can add the incoming and outgoing relationship types to the visualisation including the connected “Label” Nodes. It is also possible to remove a Relationship Type or a “Label” Node from the canvas.

Note that when you ‘Clear’ a Label or Relationship Type the Connected Nodes will remain on the canvas.

Links

The source code for the Neo4j db Analyzer is on Github at kvegter/dbreportapp. You can read documentation there and report issues.

If you have questions regarding your Neo4j database, you can always head to the #help-cypher channel on the Neo4j Users Slack or on the neo4j community.


Introducing the Neo4j Database Analyzer was originally published in neo4j on Medium, where people are continuing the conversation by highlighting and responding to this story.

0 Likes