Neo4j driver sometimes hangs on FetchSize greater than 500

  • Neo4j version: 4.0.4 Enterprise
  • Driver: .NET Bolt driver 4.1.0
  • Heap Initial Size: 512 MB
  • Heap Max Size: 6 GB
  • Pagecache Size: 2 GB

I am noticing that Neo4j driver for .NET sometimes hangs when reading a large result set.
I'm running a query that returns ~48000 result entries and fetchSize is set to 1000. While reading the result set from server, record by record, the process suddenly hangs after reading 15000, 20000 or 30000 records from the results cursor. There is no clear pattern or threshold when the process hangs.
After the 30 seconds timeout is reached, the connection is closed with following error message "Unable to read message from server bolt://localhost:7687/, connection will be terminated.
System.IO.IOException: Unexpected end of stream, read returned 0. RemainingMessageDataSize = 8719, MessageCount = 0"

Full stack trace:
Neo4jLogger => [bolt-9] Unable to read message from server bolt://localhost:7687/, connection will be terminated.
System.IO.IOException: Unexpected end of stream, read returned 0. RemainingMessageDataSize = 8719, MessageCount = 0
at Neo4j.Driver.Internal.IO.ChunkReader.ProcessStream(Stream outputMessageStream) in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\IO\ChunkReader.cs:line 154
at Neo4j.Driver.Internal.IO.ChunkReader.ReadNextMessagesAsync(Stream outputMessageStream) in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\IO\ChunkReader.cs:line 205
at Neo4j.Driver.Internal.IO.MessageReader.ReadAsync(IResponsePipeline pipeline) in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\IO\MessageReader.cs:line 64
at Neo4j.Driver.Internal.Connector.SocketClient.ReceiveOneAsync(IResponsePipeline responsePipeline) in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\Connector\SocketClient.cs:line 112
warn: Neo4jQueryConsoleBasic.Program[0]
Neo4jLogger => [bolt-9] Unable to send message to server bolt://localhost:7687/, connection will be terminated.
System.IO.IOException: Unable to write data to the transport connection: Cannot access a disposed object.
Object name: 'System.Net.Sockets.Socket'..
---> System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.Socket'.
at System.Net.Sockets.Socket.BeginSend(Byte buffer, Int32 offset, Int32 size, SocketFlags socketFlags, SocketError& errorCode, AsyncCallback callback, Object state)
at System.Net.Sockets.Socket.BeginSend(Byte buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
at System.Net.Sockets.Socket.SendAsyncApm(ReadOnlyMemory1 buffer, SocketFlags socketFlags) at System.Net.Sockets.Socket.SendAsyncForNetworkStream(ReadOnlyMemory1 buffer, SocketFlags socketFlags, CancellationToken cancellationToken)
at System.Net.Sockets.NetworkStream.WriteAsync(Byte buffer, Int32 offset, Int32 size, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.WriteAsync(Byte buffer, Int32 offset, Int32 size, CancellationToken cancellationToken)
at System.IO.MemoryStream.CopyToAsync(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at System.IO.Stream.CopyToAsync(Stream destination)
at Neo4j.Driver.Internal.IO.ChunkWriter.SendAsync() in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\IO\ChunkWriter.cs:line 179
at Neo4j.Driver.Internal.IO.MessageWriter.FlushAsync() in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\IO\MessageWriter.cs:line 68
at Neo4j.Driver.Internal.Connector.SocketClient.SendAsync(IEnumerable`1 messages) in C:\test-project\NeoGraphTools\Neo4j.Driver\Internal\Connector\SocketClient.cs:line 90

Can you please advise?

Hello!

Would you be able to put the code you're using up here as well, so we can see what's going on?

All the best

Chris

Hi Chris,

Thanks for your quick response. First and foremost I've attached the source code. The driver is connecting to the local graph and password has been deleted. I would like to make an addition the initial message: after the 30 seconds timeout is reached, the connection is closed with the error message below listed above. The retry policy kicks in and after a second attempt which time-outs as well the transaction throws the exception "Unable to read..."

The code has been run with various values for fetchSize parameter and the results are as follows:

  • fetchSize=250 - the run finished successfully, 48k rows retrieved successfully the majority of times.
  • fetchSize=500 - the run (randomly) hangs after retrieving 15000 or 23000 records
  • fetchSize=1000 - the run hands after retrieving 24000 records

Unfortunaltely I'm not able to attach the png screenshots for the above the fetchSize configurations, I'm getting the message "Sorry, new users can not upload images."

All the best,
Bogdan.sourcecode.txt (3.3 KB)

Hey Bogdan,

Thanks for the source, whilst I look at it, a couple of questions:

  1. Have you always had fetchsize set? what happens if you don't set it at all?
  2. Can you run that query in the browser? Does it timeout there as well?
  3. What about Cypher Shell? Same thing?

All the best

Chris

Hi Chris,

  1. Yes, fetchSize is always set. If I don't set it (line 38 in source code) then it uses the default fetchSize which is 1000. The behavior cannot be described by a general rule because the retrieval usually hangs after 3k records although two executions hanged at 14k records.

  2. The same query runs with no issues in Neo4j Browser but only the first 1000 records are retrieved and displayed. I believe this is the default Neo4j Browser behavior. Besides the first 1k records, the output displays the summary as follows: "Started streaming 48177 records in less than 1 ms and completed after 1 ms, displaying first 1000 rows.". No timeouts encountered.

  3. The same query runs with no issue in Cypher Shell and all 48k records are returned and displayed at output. No timeouts encountered.

Please note that if the transaction timeout is not set in .NET code (line 64 in source code), the process hangs indefinitely until the process is killed.

All the best,
Bogdan

Hi Chris - Did you had the chance to look over the source code? Do you have any updates?

Thanks! All the best,
Bogdan

Hi Bogdan,
I'm currently in the process of testing a potential fix for this. Once confirmed that it works, I will release a minor version of the 4.1 driver and let you know.

Thanks
Andy

Thanks @AndyHeap-NeoTec - @rotarubogdan89 - We'll keep you posted!

@rotarubogdan89 - Release 4.1.1 has now been made available on NuGet which should fix this issue for you.

Thanks
Andy

1 Like

@AndyHeap-NeoTec - I have updated the package reference to version 4.1.1 and the odd behavior described initially is not reproducible anymore. Hence, the issue is fixed.

However, it seems that it uses all the time the default fetchSize (1000). I'm using session config builder to overwrite the default fetchSize but it seems that it does not change anymore. Can you please check if this feature has been broken or am I doing something wrong?

var session = _driver.AsyncSession(configBuilder => { configBuilder.WithFetchSize(_fetchSize); });

Thanks! All the best,
Bogdan

@rotarubogdan89 - The fetch size code has not been impacted by the bug fix, so it is possible that you have found a separate issue that needs addressing. I will investigate. Would you be able to try running using the 4.0.3 .Net driver to see if the behaviour is any different?

@AndyHeap-NeoTec - I double checked and it was a misunderstanding on my side. The fetch size set on session level works as expected.

Thanks a lot for your help and quick turnaround. If you agree we can consider this issue closed.

All the best,
Bogdan

Thanks for the update. I agree that we can consider this issue now closed.