← Inside the Django ORM: Aggregates | Python 1.0 →

Error Starting Gunicorn in Supervisor

Recently came across an error trying to run gunicorn in supervisor. A command that worked perfectly when run on its own failed when launched through supervisor. This manifested as a repeated error stating that the port was already in use:

2014-03-07 14:25:09 [9235] [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:09 [9235] [ERROR] Retrying in 1 second.
2014-03-07 14:25:10 [9235] [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:10 [9235] [ERROR] Retrying in 1 second.
2014-03-07 14:25:11 [9235] [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:11 [9235] [ERROR] Retrying in 1 second.
2014-03-07 14:25:12 [9235] [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:12 [9235] [ERROR] Retrying in 1 second.
2014-03-07 14:25:13 [9235] [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:13 [9235] [ERROR] Retrying in 1 second.
2014-03-07 14:25:14 [9235] [ERROR] Can't connect to ('127.0.0.1', 8000)

Searching online turned up several reports of similar issues, but none with an explanation that fit our circumstances or a fix that resolved the issue. Many of the solutions suggested that the user had already started a process listening on the same port. That was not our situation.

I hope that this explanation will help others having the same problem.

We had gunicorn running just fine when invoked from the command line, and had deployed our production stacks that way. Launch a gunicorn command, fire up an nginx server to buffer requests, and let it rip. It worked perfectly, but as soon as we took the same command, and put it under the control of supervisor, it failed. Exact same command. No other process running on the same port, but it claimed the port was busy.

Here's the command we were using:

/srv/virtualenv/bin/gunicorn --config=/srv/configs/gunicorn.conf --daemon wsgi-app:app

The problem was the --daemon flag. When supervisor fired up gunicorn, the server would start, daemonize, and spin off worker processes, but in daemonizing, the original process would be killed off, with supervisor being notified that the process had died, so supervisor would kick off a new process to replace the killed one. This new process would find its TCP port (8000) occupied by the (still functioning) daemonized gunicorn process, report the error, try again once a second for several tries, then die. At which point supervisor would kick off another process and continue the problem.

Simply removing the --daemon flag from the gunicorn invocation solved the problem. Now, it looks likes other people who are seeing these symptoms may be triggering the problem in different ways, but the take away message is that supervisor needs its supervised processes to remain active. Forking off subprocesses can cause supervisor to think its subordinates have died when they haven't. If your supervisor invokes gunicorn, don't daemonize it. If it invokes a shell script, don't call background tasks and then exit. Keep that process alive. It shouldn't die until it's time for supervisor to call another copy of the process.

Comments !