More traffic and now gettin "Failed to establish connection" message

Neo4j 3.5.12 Community
Python 3.6.9
neo4j-driver==1.7.6
neobolt==1.7.17
Ubuntu Server 18.04
Flask 1.1.2

A web server I run has suddenly experienced a 2000% increase in traffic. Everything was working fine before this increase. However, now I'm seeing the following error after a few hours (unsure how long, but within 5 hours) of heavy traffic and then fail constantly after that. Like connections are not being closed or something.

     db = GraphDatabase.driver("bolt://localhost:MYPORTNO", auth=basic_auth(DATABASE_USERNAME, DATABASE_PASSWORD, encrypted=False)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neo4j/__init__.py", line 120, in driver
     return Driver(uri, **config)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neo4j/__init__.py", line 161, in __new__
     return subclass(uri, **config)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neo4j/__init__.py", line 235, in __new__
     pool.release(pool.acquire())
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neobolt/direct.py", line 715, in acquire
     return self.acquire_direct(self.address)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neobolt/direct.py", line 608, in acquire_direct
     connection = self.connector(address, error_handler=self.connection_error_handler)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neo4j/__init__.py", line 232, in connector
     return connect(address, **dict(config, **kwargs))
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neobolt/direct.py", line 972, in connect
     raise last_error
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neobolt/direct.py", line 962, in connect
     s = _connect(resolved_address, **config)
   File "/home/ubuntu/api/env/lib/python3.6/site-packages/neobolt/direct.py", line 843, in _connect
     raise ServiceUnavailable("Failed to establish connection to {!r} (reason {})".format(resolved_address, error))
neobolt.exceptions.ServiceUnavailable: Failed to establish connection to ('127.0.0.1', MYPORTNO) (reason [Errno 111] Connection refused)

If I restart apache it works again, so currently as a bandaid, I'm doing that on a cron job every few hours, but that only seems to partially work.

I increased the page cache and heap calculating according to this article.

I increased the Open Files Limit from the default 1024 to 50000.

I doubled the server spec.

Structurally I create a database driver instance in a config file:

Config file:

db = None
def getDB():
    global db
    if not db:
        db = GraphDatabase.driver("bolt://localhost:XXXXXX", auth=basic_auth(DATABASE_USERNAME, DATABASE_PASSWORD), encrypted=False) 

    return db

I can then make queries in my module files as follows. I believe the following syntax properly manages connections being opened and closed due to the with statement, and so shouldn't leave open connections:

Module file:

db = config.getDB()

with db.session() as s:
        with s.begin_transaction() as tx:
            tx.run("the cypher", {the params})

I am also seeing the following error in the logs, although I'm not sure they are directly related.

ValueError: filedescriptor out of range in select()

Does anyone know what is causing this or how to fix it please? Thanks!

Hello @doug :slight_smile:

Let me share some of my code with you, I hope it will help you :slight_smile:

def connect_to_instance(logger, instance, login, password):
    """
    Function to connect to the graph database.
    """
    try:
        return GraphDatabase.driver(instance, auth=(login, password), encrypted=False)
    except Exception as e:
        logger.critical("%s :: %s :: CONNECT_TO_INSTANCE :: %s", os.path.basename(__file__), os.getpid(), e)
        sys.exit(1)

def bolt_to_list(result):
    """
    Function to transform BOLT result into list of dictionnaries.
    """
    return [r.data() for r in result]

def execute_query(ctx, query, **kwargs):
    """
    Function to execute a Cypher query.
    """
    with ctx.session() as ses:
        return bolt_to_list(ses.run(query, kwargs))

GRAPH_CTX = connect_to_instance(logger, "bolt://localhost:7687", "neo4j", "test")
execute_query(GRAPH_CTX , "MATCH (n:Item{name:$name}) RETURN n LIMIT 25", name="maxime")

Regards,
Cobra

Thank you for this code. Would you create the GRAPH_CTX driver instance before each query and then close it again? Currently, I create it once on Flask app launch in a config.py file and reference it from other modules. I checked and that does only create one instance of the driver.

I did notice that I wasn't calling close() on the drive object on terminating the Flask app so I've fixed that, but in normal operation, the only time that would happen is when I reboot the server during the daily backup at which point everything starts afresh anyway.

I note that the driver I'm using is quite old. Could that be it?

Any idea what's causing the error?

Thanks :)

You can create one connection when the flask api starts and then use it for each query.

When you use ctx.session() you don't need to close or open something, it's like open a file with with in Python.

Yes you version if old but I don't know if the lastet version (>4.0) will work with your 3.5 database. Maybe you should upgrade your database to the last version.

For the error, have you change in neo4j.conf:
dbms.default_listen_address=0.0.0.0

Don't forget to restart the database if you changed something in the conf file.

Regards,
Cobra

Thanks very much. I'll try these things and see what happens. Much appreciated.

Quick update. I updated the driver to 4.1.1. This is compatible with Neo4j 3.5, but there was a bit of refactoring of the python code required to get it working.

However, I don't think that will have solved it, I just wanted to do it. As I went through the code I noticed a couple of cron jobs that were not closing the database driver though. One of them was running every 10 minutes. The server restarts at 3am every day, so by 3pm in the afternoon this would mean that 72 instances of the database driver would have been opened and not closed and that as the day goes on it just got worse.

So I am pretty sure that this was the problem. I'll run it for a couple of days and see what happens!

Thanks again for your help.