Issue with extremely large block.big_values.db on version 5.26.0

  • neo4j version: 5.26.0, desktop version: 1.6.1

Hello!

I recently brought up an issue where my database with heavily edited edges was growing exponentially in size, much larger than expected. The resolution recommended was a database update, but it seems that the issue has rearose on this new version.

Once again, my database ballooned from 6GB to 130GB. After doing a neo4j-admin database copy I once again returned to 6GB.

Is this a side effect of upgrading from 5.24? Is this amount of fragmentation common? Pinging @dana_canzano since they were very helpful previously.

Otherwise, I feel like this is a persistent bug that was not fixed in 5.26.

Some sysinfo commands (for the same database, before and after a copy):

@joshualawson8

do you have details at the file system and at data/database/ and data/transactions which describes where all the data is being used. Are we still seeing a very large data/databases/<databaseName>/block.big_values.db and for example in many many GB?

one of the things neo4j-admin database copy does is effectively deletes the current data/transactions/<databaseName>/ ( and this is expected) so that if prior to running it is 10GB, post running neo4j-admin database copy you should expect this path to be significantly smaller, i.e. in the MB range. Now I dont suspect you had 100GB+ of txn logs prior to running the neo4j-admin database copy but without a before/after of the filesystem for data/ this is not so easy to understand

Yeah, it's all in block.big_values.db. This is the old database, where 97.6% of the neo4jclean db is being taken up by these values. Here is a windirstat image of the whole structure:

Here is a picture of my /data/transactions dir (most around 250 mb)

Was it possible this was fixed after 5.26.0? that's my current version.

@joshualawson8

yes the fix was definitely in 5.26.0 and later.
So this looks like a new manifestation.
Are you able to provide details of the cypher injestion. i.e.

  • do you populate a significant # of nodes/rels and with properties and then delete said nodes/rels and recreate
  • do you populate a significant # of nodes/rels and with properties and then update said nodes/rels properties?
  • do you populate a significant # of nodes/rels and with properties and then add more properties to said nodes/rels

just trying to see if we can get a well defined repro experience

Yes of course-here are snippets of the cypher queries I've been running for creation. Perhaps these are the issue?

Create/merge

MERGE (a:Artist {spotifyId: $spotifyId})
ON CREATE SET
    a.name = $name,
    a.image = $image,
    a.popularity = $popularity,
    a.crawlStatus = 'uncrawled'

Create/merge edge

MERGE (a1:Artist {spotifyId: $spotifyId1})
MERGE (a2:Artist {spotifyId: $spotifyId2})
MERGE (a1)-[r:COLLABORATED_WITH]-(a2)
ON CREATE SET
    r.songUris = $songUris,
    r.songNames = $songNames,
    r.albumUris = $albumUris,
    r.images = $images
ON MATCH SET
    r.songUris = r.songUris + $songUris,
    r.songNames = r.songNames + $songNames,
    r.albumUris = r.albumUris + $albumUris,
    r.images = r.images + $images
RETURN r

Indicies I'm creating:

CREATE INDEX IF NOT EXISTS
FOR (a:Artist)
ON (a.name);

CREATE INDEX IF NOT EXISTS
FOR (a:Artist)
ON (a.spotifyId);