Issue with extremely large block.big_values.db on version 5.26.0

joshualawson8 · May 5, 2025, 10:54pm

neo4j version: 5.26.0, desktop version: 1.6.1

Hello!

I recently brought up an issue where my database with heavily edited edges was growing exponentially in size, much larger than expected. The resolution recommended was a database update, but it seems that the issue has rearose on this new version.

Once again, my database ballooned from 6GB to 130GB. After doing a neo4j-admin database copy I once again returned to 6GB.

Is this a side effect of upgrading from 5.24? Is this amount of fragmentation common? Pinging @dana_canzano since they were very helpful previously.

Otherwise, I feel like this is a persistent bug that was not fixed in 5.26.

joshualawson8 · May 5, 2025, 11:04pm

Some sysinfo commands (for the same database, before and after a copy):

dana_canzano · May 6, 2025, 10:23am

@joshualawson8

do you have details at the file system and at data/database/ and data/transactions which describes where all the data is being used. Are we still seeing a very large data/databases/<databaseName>/block.big_values.db and for example in many many GB?

one of the things neo4j-admin database copy does is effectively deletes the current data/transactions/<databaseName>/ ( and this is expected) so that if prior to running it is 10GB, post running neo4j-admin database copy you should expect this path to be significantly smaller, i.e. in the MB range. Now I dont suspect you had 100GB+ of txn logs prior to running the neo4j-admin database copy but without a before/after of the filesystem for data/ this is not so easy to understand

joshualawson8 · May 7, 2025, 12:36am

Yeah, it's all in block.big_values.db. This is the old database, where 97.6% of the neo4jclean db is being taken up by these values. Here is a windirstat image of the whole structure:

Here is a picture of my /data/transactions dir (most around 250 mb)

Was it possible this was fixed after 5.26.0? that's my current version.

dana_canzano · May 7, 2025, 1:57pm

@joshualawson8

yes the fix was definitely in 5.26.0 and later.
So this looks like a new manifestation.
Are you able to provide details of the cypher injestion. i.e.

do you populate a significant # of nodes/rels and with properties and then delete said nodes/rels and recreate
do you populate a significant # of nodes/rels and with properties and then update said nodes/rels properties?
do you populate a significant # of nodes/rels and with properties and then add more properties to said nodes/rels

just trying to see if we can get a well defined repro experience

joshualawson8 · May 7, 2025, 4:47pm

Yes of course-here are snippets of the cypher queries I've been running for creation. Perhaps these are the issue?

Create/merge

MERGE (a:Artist {spotifyId: $spotifyId})
ON CREATE SET
    a.name = $name,
    a.image = $image,
    a.popularity = $popularity,
    a.crawlStatus = 'uncrawled'

Create/merge edge

MERGE (a1:Artist {spotifyId: $spotifyId1})
MERGE (a2:Artist {spotifyId: $spotifyId2})
MERGE (a1)-[r:COLLABORATED_WITH]-(a2)
ON CREATE SET
    r.songUris = $songUris,
    r.songNames = $songNames,
    r.albumUris = $albumUris,
    r.images = $images
ON MATCH SET
    r.songUris = r.songUris + $songUris,
    r.songNames = r.songNames + $songNames,
    r.albumUris = r.albumUris + $albumUris,
    r.images = r.images + $images
RETURN r

Indicies I'm creating:

CREATE INDEX IF NOT EXISTS
FOR (a:Artist)
ON (a.name);

CREATE INDEX IF NOT EXISTS
FOR (a:Artist)
ON (a.spotifyId);

dana_canzano · May 8, 2025, 1:28pm

@joshualawson8

Thanks for all this detail. I think im able to reproduce and if so will report to engineering and let you know of its progress.

Besides the block.big_values.db being large the block.big_values.db.id is also 'relatively' large. This file, and files ending in .id typically represent the internal 'ids' which were once used and then freed up, as a result of a delete / update, and these ids are thus eligible for re-use. But in this case it appears we are not reusing these id, thus leading to unexpected growth.

As I have more I will let you know.

joshualawson8 · May 8, 2025, 9:34pm

Of course, let me know literally anything else I can do to help. I can provide my code, a .cypher file of my db, etc.

Topic		Replies	Views
Issue with extremely large block.big_values.db file Cypher	7	44	April 29, 2025
Files keep on growing in number and size under neo4j installation folder/databases/graph.db Neo4j Graph Platform	17	918	March 18, 2021
Query - large volumes in Spring Data Spring Data Neo4j & Neo4j-OGM	3	570	January 27, 2021
Unable to connect to database Neo4j Graph Platform	4	506	June 2, 2020
Unexplained Daily Increment of DB Size Operations operations	0	219	September 8, 2020

Issue with extremely large block.big_values.db on version 5.26.0

Related topics