We don’t want to stop our services when updating one of Docker images in a cluster if possible. If the new image fails to start we should rollback the change soon. Docker swarm offers the functionality. What we should remember is that an image has many dependencies.
- Our application has its dependencies
- The dependencies have their dependencies
- SDK that complies the application
- Application platform where the application runs
- Operating system
Needless to say, it’s better to keep our dependencies and operating system up to date for security reason but the updating the service every month is more expensive than twice a year. However, if we do it the service gets much healthier. Let’s see how Docker updates the services.
You can find the complete source code here
This is one of Docker learning series posts. If you want to learn Docker deeply, I highly recommend Learn Docker in a month of lunches.
- Start Docker from scratch
- Docker volume
- Bind host directory to Docker container for dev-env
- Communication with other Docker containers
- Run multi Docker containers with compose file
- Container’s dependency check and health check
- Override Docker compose file to have different environments
- Creating a cluster with Docker swarm and handling secrets
- Update and rollback without downtime in swarm mode
- Container optimization
- Visualizing log info with Fluentd, Elasticsearch and Kibana
Create yml file for swarm from docker-compose file
We need to create a compose file for docker stack command because it doesn’t support multiple compose files. I created docker compose files separately because I don’t want to have the same code in different files. The compose files look like following.
# docker-compose.yml
version: "3.7"
x-labels: &app-net
networks:
- app-net
services:
test-app:
image: test-app:v1
<<: *app-net
poke-app:
image: poke-app:v1
environment:
- TEST_APP_URL=http://test-app
<<: *app-net
# docker-compose-pro.yml
version: "3.7"
services:
test-app:
ports:
- target: 80
mode: host
deploy:
mode: global
poke-app:
ports:
- "8888:80"
deploy:
replicas: 6
networks:
app-net:
name: update-rollback-network
There are two options that I haven’t used in previous Docker related blog posts.
mode: host
in ports section means that the specified port number is directly bind to the host machine which means ingress is not used for it.mode: global
in deploy section means that only one container runs on every node.
Using ingress means that additional work is added there because ingress needs to pass the request to the other containers. If we think it’s enough to have one container per node this settings may help to improve performance. I didn’t specify the published port because it causes port conflict when updating the container without downtime.
Let’s create a merged compose file from the two files by docker-compose config command. I create a bat file to create several compose files.
cd update-rollback
# create multiple stack yaml files.
# one of them is following command
# docker-compose -f docker-compose.yml -f docker-compose-pro.yml config > stack-v1.yml
./create-stack.bat
Update services with default setting
Let’s start the service.
$ docker stack deploy -c stack-v1.yml update-rollback
Creating network update-rollback-network
Creating service update-rollback_poke-app
Creating service update-rollback_test-app
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
4j9uwgieebye update-rollback_poke-app replicated 6/6 poke-app:v1 *:8888->80/tcp
a7absegp0osq update-rollback_test-app global 1/1 test-app:v1
$ docker stack ps update-rollback
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
zq4vznowm7wz update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v1 docker-desktop Running Running 33 seconds ago *:32774->80/tcp
w6iiaucooh7h update-rollback_poke-app.1 poke-app:v1 docker-desktop Running Running 24 seconds ago
lzeyrrosukih \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Failed 31 seconds ago "task: non-zero exit (6)"
34pb08jluc5o update-rollback_poke-app.2 poke-app:v1 docker-desktop Running Running 27 seconds ago
svosyp7zvi8e \_ update-rollback_poke-app.2 poke-app:v1 docker-desktop Shutdown Failed 33 seconds ago "task: non-zero exit (6)"
I deleted lines to keep the result small. 6 replica of poke-app started after test-app started up because poke-app has dependency check command. So it showed shutdown status of poke-app for the reason. The necessary services are running correctly now. Let’s update the service with default setting. The compose file to update the version is simple.
# docker-compose-v2-1.yml
version: "3.7"
services:
test-app:
image: test-app:v2
$ docker stack deploy -c stack-v2-1.yml update-rollback
Updating service update-rollback_poke-app (id: 4j9uwgieebyej4cd9gdabcshp)
image poke-app:v1 could not be accessed on a registry to record
its digest. Each node will access poke-app:v1 independently,
possibly leading to different nodes running different
versions of the image.
Updating service update-rollback_test-app (id: a7absegp0osqbrhytb3npvtww)
image test-app:v2 could not be accessed on a registry to record
its digest. Each node will access test-app:v2 independently,
possibly leading to different nodes running different
versions of the image.
$ docker stack ps update-rollback
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
9h6madlgvy7u update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v2 docker-desktop Running Starting 1 second ago
zq4vznowm7wz \_ update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v1 docker-desktop Shutdown Shutdown 1 second ago
m5bf7wix4b08 update-rollback_poke-app.1 poke-app:v1 docker-desktop Running Running 1 second ago
w6iiaucooh7h \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Failed 9 seconds ago "task: non-zero exit (1)"
lzeyrrosukih \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Failed 2 minutes ago "task: non-zero exit (6)"
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
4j9uwgieebye update-rollback_poke-app replicated 6/6 poke-app:v1 *:8888->80/tcp
a7absegp0osq update-rollback_test-app global 1/1 test-app:v2
The test-app was updated but there was service down during the update because poke-app tried to send a request to test-app but test-app was not ready then. Docker swarm shutdown the target container first and then starts the new container by default. It’s working now but we wanted to avoid the downtime. To avoid this downtime, we should start the new container first and then shutdown the old container.
Start new container first
I configured it in docker-compose-v2-2.yml.
version: "3.7"
services:
test-app:
image: test-app:v2
deploy:
update_config:
order: start-first
Let’s update it from version 1 again.
# remove the current services
$ docker stack rm update-rollback
$ docker stack deploy -c stack-v1.yml update-rollback
# wait until 6 replicas start
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
j4wf6v1cgbna update-rollback_poke-app replicated 6/6 poke-app:v1 *:8888->80/tcp
o3q6ki34flqk update-rollback_test-app global 1/1 test-app:v1
$ docker stack deploy -c stack-v2-2.yml update-rollback
$ docker stack ps update-rollback
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
lv2itvmlza4t update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v2 docker-desktop Running Running 1 second ago *:32777->80/tcp
cx155jq6eo0q \_ update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v1 docker-desktop Shutdown Running 1 second ago
yzfgenbepg5v update-rollback_poke-app.1 poke-app:v1 docker-desktop Running Running 43 seconds ago
qhwb1m35ofwn \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Failed 49 seconds ago "task: non-zero exit (6)"
shebw2kig9rk update-rollback_poke-app.2 poke-app:v1 docker-desktop Running Running 39 seconds ago
bfzpl2r8f2do \_ update-rollback_poke-app.2 poke-app:v1 docker-desktop Shutdown Failed 47 seconds ago "task: non-zero exit (6)"
rz0d2xjy4wyl update-rollback_poke-app.3 poke-app:v1 docker-desktop Running Running 39 seconds ago
There is no additional shutdown status with exit code 1 this time because new container started before shutting down the old container. Good!
Update multiple containers step by step
Next, we will update poke-app. If we update all replicas at once downtime may happen because new container may not work as expected. It’s better to update it step by step. If it fails to update the container we should rollback the container. I configured it in docker-compose-v3-bad.yml. I configured HEALTH=BAD
in order to make it fail.
version: "3.7"
services:
test-app:
image: test-app:v2
poke-app:
image: poke-app:v2
environment:
- HEALTH=BAD
deploy:
update_config:
parallelism: 2
monitor: 60s
failure_action: rollback
order: start-first
parallelism
: number of replicas that are updated at once.monitor
: the monitoring time to treat as unhealthy state. It should be longer than the total amount of time of health check. If the status becomes unhealthy within this time it triggers the action defined infailure_action
.failure_action
: this action is triggered if the container becomes unhealthy within the monitor time. Default action is pause. continue is another option but I think it’s risky.order
: what to do first. Stop the old container or start the new container. The default isstop-first
Let’s update with the configuration.
$ docker stack deploy -c stack-v3-bad.yml update-rollback
$ docker stack ps update-rollback
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
lv2itvmlza4t update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v2 docker-desktop Running Running 8 minutes ago *:32777->80/tcp
cx155jq6eo0q \_ update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v1 docker-desktop Shutdown Shutdown 7 minutes ago
re3j916rcz2b update-rollback_poke-app.1 poke-app:v2 docker-desktop Shutdown Shutdown 12 seconds ago
yzfgenbepg5v \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Running Running 8 minutes ago
qhwb1m35ofwn \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Failed 8 minutes ago "task: non-zero exit (6)"
It failed to update the container. poke-app version 2 status is shutdown and version 1 is still running. If we see the rollback state it looks like this below.
$ docker service inspect --pretty update-rollback_poke-app
ID: nnd66g9ytq4eozu9zq7mi5tg0
Name: update-rollback_poke-app
Labels:
com.docker.stack.image=poke-app:v1
com.docker.stack.namespace=update-rollback
Service Mode: Replicated
Replicas: 6
UpdateStatus:
State: rollback_completed
Started: 2 minutes ago
Message: rollback completed
I deleted the output under Message
. We could confirm that the rollback was done correctly. Let’s update it again with correct configuration with HEALTH=GOOD
defined in docker-compose-v3-good.yml.
$ docker stack deploy -c stack-v3-good.yml update-rollback
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
nnd66g9ytq4e update-rollback_poke-app replicated 8/6 poke-app:v2 *:8888->80/tcp
xoieulklrhig update-rollback_test-app global 1/1 test-app:v2
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
nnd66g9ytq4e update-rollback_poke-app replicated 6/6 poke-app:v2 *:8888->80/tcp
xoieulklrhig update-rollback_test-app global 1/1 test-app:v2
$ docker stack ps update-rollback | grep Running
lv2itvmlza4t update-rollback_test-app.zwfh3t5x51nmlu0vgnyzn2j9q test-app:v2 docker-desktop Running Running 2 hours ago *:32777->80/tcp
8wkk4cdptztt update-rollback_poke-app.1 poke-app:v2 docker-desktop Running Running 1 second ago
z0jvq5v8a9yj \_ update-rollback_poke-app.1 poke-app:v1 docker-desktop Shutdown Running 1 second ago
0fso1613w5or update-rollback_poke-app.2 poke-app:v2 docker-desktop Running Running 14 seconds ago
694w4odop160 \_ update-rollback_poke-app.2 poke-app:v1 docker-desktop Shutdown Running 14 seconds ago
ej0eyzfd4xkr update-rollback_poke-app.3 poke-app:v2 docker-desktop Running Running 1 second ago
z7lt5ozffjb3 \_ update-rollback_poke-app.3 poke-app:v1 docker-desktop Shutdown Running 1 second ago
qf6rnqjk12a1 update-rollback_poke-app.4 poke-app:v2 docker-desktop Running Running 25 seconds ago
vtkfwqhi11zr update-rollback_poke-app.5 poke-app:v2 docker-desktop Running Running 25 seconds ago
qx3hi66v64y5 update-rollback_poke-app.6 poke-app:v2 docker-desktop Running Running 14 seconds ago
jh3v2f6v92f3 \_ update-rollback_poke-app.6 poke-app:v1 docker-desktop Shutdown Running 14 seconds ago
The number of replicas is 8 at first because of start-first policy. However the number of replicas is 6 after a while and application version is v2. It succeeded to update the container version this time. The timing to update the containers is bit different because parallelism is 2. If we want to have more delay between update we can configure it by specifying delay
option in update_config
section.
Rollback the service by hand
Rollback can be done by hand as well in case the service doesn’t work as expected although the health check says it’s healthy.
$ docker service update --rollback update-rollback_poke-app
update-rollback_poke-app
Conclusion
This update mechanism makes our work easy especially when we have multiple servers. Docker takes care of many things. Key point here is to define the update config in order to avoid downtime. If a server has enough resource to start the new container start-first policy is good but if not, it may not be good. We should choose best way depending on our situation.
Comments