Neo4j 5.18.1: massive performance hit compared to 4.4.32

Hi !

I would like to upgrade my app from Neo4j 4.4.32 -> 5.18.1.

However, I noticed a huge performance impact when I run the same script over the 5.x series.

4.4.32

$ time script.py
...
40,03s user 21,68s system 123% cpu 49,984 total

5.18.1

$ time script.py
...
5.18.1: 58,73s user 39,59s system 16% cpu 9:40,43 total

The performance impact is absolutely massive : +1060%

I'm simply inserting data in a single transaction with multiple MERGE operations.

This is a blocker for me as i can't consider upgrading to Neo4j 5.X.

I can't share the specifics of the code, and there isn't anything suspicious in the logs.
Furthermore, the problem isn't related to a single request being slow, since the app performance very well at the beginning, and the performance degraded rapidly over time.

I've monitored the RAM consumption of Neo4j instance, and it didn't look abnormal.
The CPU usage however, was always 100%, so there might be a lead in that direction.

Next step is to bisect and see if it's reproducible for the entire 5.x series.

EDIT: Tested with Neo4J 5.1 same results: 8:46,43 total :frowning:

  • Neo4j 5.18.1 (Docker image)
  • stack: Python app -> Neomodel 5.2.1 with neo4j 5.15
  • no plugins

@Wenzel

I'm simply inserting data in a single transaction with multiple MERGE operations.

is there an index on the label/property upon which the MERGE is run against? and was said index on 4.4.32 ?

Hi @dana_canzano

thanks very much for your reply !

There is an index created at the beginning, using

python

for label, unique_prop_list in constraints.items():
  for unique_prop in unique_prop_list:
    session.run(f"CREATE CONSTRAINT ON (n:{label}) ASSERT n.{unique_prop} IS UNIQUE")CREATE CONSTRAINT ON (n:{label}) ASSERT n.{unique_prop} IS UNIQUE

On each situation, I've instantiated the database with docker compose, and ran my script, which took care of creating the indexes and indexing the data.

What was your idea ?

@Wenzel

what was your idea?

MERGE statements upon a node are effectively a update or create a node. In order to update it needs to see if the node previously exists. For example if you have

merge (n:Person {id:1}) set n.status='active';

this would first need to check if there are :Person nodes with id=1. If you have 100million :Person nodes then we examine 1 by 1 each :Person node to determine if it has a property named id and it has a value of 1. Now if you have an index on :Person(id) we consult the index and thus dont need to examine 1 by 1 each of the 100 million :Person nodes.

Now if we find :Person nodes with id=1 then we set each status property to active.
If no :Peron nodes exist with id=1 then we create a :Person node with id=1 and status=active

is there an index on the label/property upon which the MERGE is run against? and was said index on 4.4.32 ?

i.e. if you had 200 indexes under 4.4.32 do you have 200 indexes under 5.18.1 ?

Hi @dana_canzano,

thanks you for pushing the idea of checking whether the indexes were actually created.

It turns out that I had the following code block:

for label, unique_prop_list in constraints.items():
    for unique_prop in unique_prop_list:
        try:
            self._log.debug("Graph: creating unique contraint on %s:%s", label, unique_prop)
            session.run(f"CREATE CONSTRAINT FOR (n:{label}) REQUIRE n.{unique_prop} IS UNIQUE")
        except ClientError as e:
            if e.code == "Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists":
                continue

Which was written based on Neo4J 4.0 syntax:

Where [IF NOT EXISTS] wasn't available, hence the try catch.

And I forgot to reraise the exception !

        except ClientError as e:
            if e.code == "Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists":
                continue
            raise e  # forgot this bit

Additionaly, the constraint syntax changed from Neo4j 4.x to Neo4j 5.x, leading to a Cypher error:

neo4j.exceptions.CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid constraint syntax, ON and ASSERT should not be used. Replace ON with FOR and ASSERT with REQUIRE. (line 1, column 1 (offset: 0))

In conclusion, my issue was an absence of indexes due to a Cypher query that was silently failing.

Once that has been fixed, perf with 5.18.1 is even better !
45,634 total

Thank you very much for your help on this @dana_canzano and sorry for this oversight ! :rocket:

3 Likes