A software becomes failing state. When our software falls into a failing state we want to restart the software in order to keep the service running. Docker offers HEALTHCHECK functionality for the purpose. There are also some containers that have dependencies to other containers. In this case, the dependent containers must startup before the container starts up. We can also cover this case in a similar way.
HEALTHCHECK function is important when we want to run the container with orchestration system like Docker Swarm or Kubernetes because they can manage to keep those service running without downtime and loss of data. They start new container while one of containers becomes unhealthy.
You can find the complete source code here
The target folders for this post are health-check-server
and log-server
.
This is one of Docker learning series posts. If you want to learn Docker deeply, I highly recommend Learn Docker in a month of lunches.
- Start Docker from scratch
- Docker volume
- Bind host directory to Docker container for dev-env
- Communication with other Docker containers
- Run multi Docker containers with compose file
- Container’s dependency check and health check
- Override Docker compose file to have different environments
- Creating a cluster with Docker swarm and handling secrets
- Update and rollback without downtime in swarm mode
- Container optimization
- Visualizing log info with Fluentd, Elasticsearch and Kibana
Health check server
I created simple http server called health-check-server
. It listens to port 80 and when it receives a request http://localhost/hello/boss
it becomes failing state. The complete code is following. When the last request is BOSS its status becomes failing. When requesting /status
it returns HTTP error 500.
import * as restify from "restify";
import { Logger } from "./Logger";
const server = restify.createServer();
const logger = new Logger("restify-server");
let isLastRequestBoss = false;
function respond(
req: restify.Request,
res: restify.Response,
next: restify.Next
) {
logger.log(`GET request with param [${req.params.name}]`);
isLastRequestBoss = false;
if ((req.params.name as string).toUpperCase() === "BOSS") {
isLastRequestBoss = true;
}
res.send('hello ' + req.params.name);
next();
}
function healthCheck(
req: restify.Request,
res: restify.Response,
next: restify.Next
) {
res.send(isLastRequestBoss ? 500 : 200);
next();
}
server.get('/hello/:name', respond);
server.get('/status', healthCheck);
server.head('/hello/:name', respond);
const port = 80;
server.listen(port, function () {
logger.log(`${server.name} listening at ${server.url}`);
});
How to add HEALTHCHECK function
Dockerfile looks like this.
FROM yuto/nodejs
EXPOSE 80
ENV LOGGER_API_URL="http://log-server:80/"
CMD node ./lib/server.js
HEALTHCHECK --interval=1s --timeout=5s --start-period=5s --retries=3 \
CMD curl --fail http://localhost/status || exit 1
WORKDIR /app
COPY ./node_modules/ /app/node_modules/
COPY ./dist/ /app/
We can specify a command how to check the container status. HEALTHCHECK expects following exit codes. Therefore, || exit 1
is added to the curl command because curl command can return other exit codes.
- 0: success
- 1: unhealthy
- 2: reserved – shouldn’t be used
Several options are specified for the health check. Let’s see the meaning of them. These variables can be defined in docker-compose file too.
- interval: interval of the health check command execution. First command execution is after specified time elapsed after the container starts up
- timeout: if health check command takes longer than this time it treats as fail
- start-period: if health check command returns 1 in this period it doesn’t treat as failing.
- retries: if the health check command returns 1 specified times in a row the container status becomes unhealthy
Check the Health status
Let’s start the containers and send some requests.
# Run these commands if you haven't created log-server image
cd log-server
# docker image build -t log-server .
npm run dbuild
# docker container run --rm -p 8001:80 --name log-server --network log-test-nat log-server
npm run dstart
# Run these commands in different window
cd health-check-server
# docker image build -t health-check-server:v1 .
npm run dbuild
# docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v1
npm run dstart
Let’s check the health status first before sending some requests. The health status can be checked by docker inspect <container name>
. The result looks like this below. The status is running and the Health.status is healthy.
$ docker inspect health-check-server
...
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 7257,
"ExitCode": 0,
"Error": "",
"StartedAt": "2020-11-11T19:12:45.3440068Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "healthy",
Send some requests.
# Check the current status
$ curl --fail http://localhost:8003/status
# Turn the status fail
$ curl http://localhost:8003/hello/BOSS
"hello BOSS"
$ curl --fail http://localhost:8003/status
curl: (22) The requested URL returned error: 500 Internal Server Error
# Turn the status success again
$ curl http://localhost:8003/hello/hey
"hello hey"
$ curl --fail http://localhost:8003/status
Health check is executed every second it becomes unhealthy state very fast but normally the interval is longer. This time, status is running but Health.Status is unhealthy because http://localhost/status
returns error code 500 a lot of times. If this container is handled by Docker Swarm it is replaced with new container.
$ docker inspect health-check-server
...
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 5120,
"ExitCode": 0,
"Error": "",
"StartedAt": "2020-11-11T18:43:32.6745183Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "unhealthy",
"FailingStreak": 166,
"Log": [
{
"Start": "2020-11-11T18:53:06.4267859Z",
"End": "2020-11-11T18:53:06.5980225Z",
"ExitCode": 1,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n
Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\ncurl: (22) The requested URL returned error: 500 Internal Server Error\n"
},
By the way, there is easier way to see the health status.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ee1e1d782097 health-check-server:v1 "docker-entrypoint.s窶ヲ" 6 seconds ago Up 4 seconds (healthy) 0.0.0.0:8003->80/tcp health-check-server
Dependency check before starting a container
Dependency can be defined in docker-compose file but it doesn’t mean that the containers are ready to use. It may takes a few minutes to be ready for some reason. In this case, since the software in the container can’t be used it’s necessary to add logic to wait for the dependent container before going into the main process. In orchestration system, the order of the startup isn’t guaranteed. If it’s possible to add dependency check before starting the container we can keep the actual code clean because it is separated function. If the container fails to start because dependent container is not ready the orchestration system starts new container. It may fail again but the container can start up in the end.
I added the dependency check logic in Dockerfile.v2 and it looks like following.
FROM yuto/nodejs
EXPOSE 80
ENV LOGGER_API_URL="http://log-server:80/"
CMD curl --fail ${LOGGER_API_URL}status && \
node ./lib/server.js
HEALTHCHECK --interval=1s --timeout=5s --start-period=5s --retries=3 \
CMD curl --fail http://localhost/status || exit 1
WORKDIR /app
COPY ./node_modules/ /app/node_modules/
COPY ./dist/ /app/
The point is to add additional command before executing desired command. The container sends request by curl --fail ${LOGGER_API_URL}status
at startup and if it fails the container stops.
docker stop log-server
docker stop health-check-server
cd health-check-server
# docker image build -t health-check-server:v2 -f Dockerfile.v2 .
npm run dbuild2
# docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v2
$ npm run dstart2
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (6) Could not resolve host: log-server
npm ERR! code ELIFECYCLE
npm ERR! errno 6
npm ERR! health-check-server@1.0.0 dstart2: `docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v2`
npm ERR! Exit status 6
...
It failed to start because log-server didn’t exist. We need to start the container again to run the service but it is good habit to make the error explicit. If the container is running it may be hard to find the root cause in some cases.
Conclusion
HEALTHCHECK and dependency check may not be necessary unless we need orchestration system but if we have it it’s easy to move. Simple curl command is used in this example but other command is also possible if it returns 0 or 1 which HEALTHCHECK expects as exit code. It may be dll, small script or something else. It’s our choice.
Comments