Recently came across an error trying to run gunicorn in supervisor. A command that
worked perfectly when run on its own failed when launched through supervisor. This
manifested as a repeated error stating that the port was already in use:
2014-03-07 14:25:09  [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:09  [ERROR] Retrying in 1 second.
2014-03-07 14:25:10  [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:10  [ERROR] Retrying in 1 second.
2014-03-07 14:25:11  [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:11  [ERROR] Retrying in 1 second.
2014-03-07 14:25:12  [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:12  [ERROR] Retrying in 1 second.
2014-03-07 14:25:13  [ERROR] Connection in use: ('127.0.0.1', 8000)
2014-03-07 14:25:13  [ERROR] Retrying in 1 second.
2014-03-07 14:25:14  [ERROR] Can't connect to ('127.0.0.1', 8000)
Searching online turned up several reports of similar issues, but none with
an explanation that fit our circumstances or a fix that resolved the issue.
Many of the solutions suggested that the user had already started a process
listening on the same port. That was not our situation.
I hope that this explanation will help others having the same problem.
We had gunicorn running just fine when invoked from the command line, and
had deployed our production stacks that way. Launch a gunicorn command, fire
up an nginx server to buffer requests, and let it rip. It worked perfectly,
but as soon as we took the same command, and put it under the control of
supervisor, it failed. Exact same command. No other process running on the
same port, but it claimed the port was busy.
Here's the command we were using:
/srv/virtualenv/bin/gunicorn --config=/srv/configs/gunicorn.conf --daemon wsgi-app:app
The problem was the --daemon flag. When supervisor fired up gunicorn, the
server would start, daemonize, and spin off worker processes, but in
daemonizing, the original process would be killed off, with supervisor being
notified that the process had died, so supervisor would kick off a new process
to replace the killed one. This new process would find its TCP port (8000)
occupied by the (still functioning) daemonized gunicorn process, report the
error, try again once a second for several tries, then die. At which point
supervisor would kick off another process and continue the problem.
Simply removing the --daemon flag from the gunicorn invocation solved the
problem. Now, it looks likes other people who are seeing these symptoms may
be triggering the problem in different ways, but the take away message is that
supervisor needs its supervised processes to remain active. Forking off
subprocesses can cause supervisor to think its subordinates have died when
they haven't. If your supervisor invokes gunicorn, don't daemonize it. If it
invokes a shell script, don't call background tasks and then exit. Keep that
process alive. It shouldn't die until it's time for supervisor to call another
copy of the process.
There are comments.