Neo4j backup fails

Hi,

System details:

  • Neo4j Enterprise 2025.03.0
  • Single node
  • Azure Kubernetes Services
  • 2 databases named “historic” and “planned”

In one of our environments we’ve started seeing occasional Neo4j backup failures on one of the databases. The backup is executed by a kubernetes cronjob - helm chart from neo4j/neo4j-admin.

Error from client:

com.neo4j.causalclustering.net.app.common.error.InboundInactivityException: Request timed out after 600000 ms.

neo admin logs.txt (199.9 KB)

debug.log:

2025-10-21 02:23:13.035+0000 WARN  [c.n.c.p.i.ServerChannelInitializer] [backup-server] Exception in outbound for channel: [id: 0x88d86429, L:/10.1.0.227:6362 ! R:/10.1.0.72:34778] without printing cause for io.netty.channel.StacklessClosedChannelException
2025-10-21 02:23:13.035+0000 WARN  [c.n.c.p.i.ServerChannelInitializer] [backup-server] Exception in outbound for channel: [id: 0x88d86429, L:/10.1.0.227:6362 ! R:/10.1.0.72:34778]
java.nio.channels.ClosedChannelException: null
	at io.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:192) ~[netty-handler-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:213) ~[netty-handler-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.handler.stream.ChunkedWriteHandler.channelInactive(ChunkedWriteHandler.java:151) ~[netty-handler-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:412) ~[netty-codec-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:377) ~[netty-codec-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1352) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:850) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:811) ~[netty-transport-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:405) ~[netty-transport-classes-epoll-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.119.Final.jar:4.1.119.Final]
	at java.lang.Thread.run(Thread.java:1583) [?:?]

debug.txt (2.7 MB)

What is causing the timeout? Is it simply a network issue?

I’ve tried increasing: dbms.cluster.network.client_inactivity_timeout

Any help would be appreciated.

Thanks.

Interesting case. I hope I will get some help from others. In the meantime, there was one thing that caught my attention. It looks like you are using store format aligned and standard. Is there a reason for this?

Have you been taking this through some upgrade path from neo4j 4 that is not completed correctly?

To not chase down some esoteric issue and wait several days, I would create backups one by one, make sure I get them migrated to block format. Then restore into a clean dbms.

I am not satisfied with not finding what is causing the issue. But if I was in this situation, I would definitely try to get into a stable state by making sure my dbms and store is in the best possible shape.

Agreed, I spotted that in the logs too and will get them upgraded to block format. Yes we’ve come from version 4…maybe even 3.