Bolt driver for Python "ValueError: filedescriptor out of range in select()"

Hi all,

I would like to ask you about Neo4j filedescriptor ERROR.
It says "ValueError: filedescriptor out of range in select()" after some loads.
Its environment is as follows.

  • Server: Amazon AWS
  • OS: Amazon Linux 2
  • Neo4j version: Enterprise 3.5.8
  • Bolt driver: 1.7 for Python
  • Python: 3.7.4 and we use multiprocessing

With some load, an error occurs like followings.
I changed LimitNOFILE from 1024 to 60000.
And I changed neo4j.conf.

  • dbms.memory.heap.initial_size=23g
  • dbms.memory.heap.max_size=23g
  • dbms.memory.pagecache.size=27400m
  • dbms.connector.bolt.thread_pool_min_size=500
  • dbms.connector.bolt.thread_pool_max_size=10000
  • dbms.connector.bolt.thread_pool_keep_alive=15m

Usually I start the Neo4j as a console. ex) "sudo /bin/neo4j start"
Still I meet the ERROR.

Could you tell me what to do?

debug.log

2020-02-20 06:28:15.721+0000 ERROR [o.n.b.t.TransportSelectionHandler] Fatal error occurred when initialising pipeline: [id: 0x6fd93f84, L:/172.31.41.6:7687 ! R:/150.249.195.153:51244] javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:433)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:330)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
        at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1647)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1615)
        at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1781)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1070)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301)
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
        ... 17 more
2020-02-20 06:28:15.721+0000 ERROR [o.n.b.t.TransportSelectionHandler] Fatal error occurred when initialising pipeline: [id: 0x6fd93f84, L:/172.31.41.6:7687 ! R:/150.249.195.153:51244] javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:405)
Python program's log

2020-02-20 06:28:27,725 INFO
Process Process-1083:
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ec2-user/graph/program.py", line 3514, in input_part
    Es1 = make_data(0, 0, UsId, OpId, 0, 0)
  File "/home/ec2-user/graph/program.py", line 123, in make_data
    driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", driver_pass))
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neo4j/__init__.py", line 116, in driver
    return Driver(uri, **config)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neo4j/__init__.py", line 157, in __new__
    return subclass(uri, **config)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neo4j/__init__.py", line 231, in __new__
    pool.release(pool.acquire())
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neobolt/direct.py", line 719, in acquire
    return self.acquire_direct(self.address)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neobolt/direct.py", line 612, in acquire_direct
    connection = self.connector(address, error_handler=self.connection_error_handler)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neo4j/__init__.py", line 228, in connector
    return connect(address, **dict(config, **kwargs))
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neobolt/direct.py", line 976, in connect
    raise last_error
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neobolt/direct.py", line 968, in connect
    connection = _handshake(s, address, der_encoded_server_certificate, **config)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/neobolt/direct.py", line 902, in _handshake
    ready_to_read, _, _ = select((s,), (), (), 1)
ValueError: filedescriptor out of range in select()
no label.wait...

This seems to be related to certificates and encryption. Are you using valid certificates on server?

Can you disable encryption and see if it works first?

Hi Anthapu,

Thanks for your reply!
I am using two servers.
The other server says the following when " ValueError: filedescriptor out of range in select() " occurs.
Sorry, I should have written this log.

2020-02-28 16:09:26.008+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 9951392 checkpoint started...
2020-02-28 16:09:26.010+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 9951392 to [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.b], from [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.a].
2020-02-28 16:09:26.328+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 9951392 checkpoint completed in 320ms
2020-02-28 16:09:26.328+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 61
2020-02-28 16:20:06.397+0000 WARN [o.n.b.t.p.HouseKeeper] Fatal error occurred when handling a client connection, remote peer unexpectedly closed connection: [id: 0x35408957, L:/127.0.0.1:7687 - R:/127.0.0.1:49278]
2020-02-28 16:21:06.623+0000 WARN [o.n.b.t.p.HouseKeeper] Fatal error occurred when handling a client connection, remote peer unexpectedly closed connection: [id: 0x54ca9da9, L:/127.0.0.1:7687 - R:/127.0.0.1:49294]

Then, I noticed that Zhen says "some function that goes over all connections on server every 4h and kill all idle connections" in the following.
Fatal error occurred when handling a client connection causes crash
Should I make those functions?