BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Running Axon Server in Docker and Kubernetes

Running Axon Server in Docker and Kubernetes

Key Takeaways

  • Docker and Kubernetes have revolutionized application packaging and deployment, supporting DevOps practices and the drive towards a reduced turnaround.
  • Kubernetes provides a good platform for microservices, but its focus on "small and scalable" provides a few challenges for infrastructure components such as AxonServer.
  • Docker Desktop on Windows has long been a moving target, but with Windows 10 build 2004, WSL 2, and Docker Desktop on WSL 2, we finally have a very capable solution that works just as well for the bash lovers as for PowerShell users.
  • Deploying an AxonServer Enterprise Edition cluster to Kubernetes requires a bit of preparation, but the result is worth it.

Introduction: The Road to Kubernetes

In the previous installment on “Running AxonServer,” we looked at its behavior locally, running it as a “Plain Old Java Application.” We also looked at the configuration properties you can use to adjust the storage locations it uses, and we added access control and TLS. In this installment we continue by looking at the platform we run it on; in particular Docker and Kubernetes.

On Containers and Microservices Architectures

If we’re going to discuss Docker and Kubernetes as platforms for Axon Server, let’s first establish what they bring to the table, and that means we will start by examining containers. Strictly speaking, the term “container” (“(1) ...that which contains.” Yea, duh! “(2) An item in which objects, materials or data can be stored or transported.” Ah, much better.) can also be applied to a Virtual Machine image, but we’ve learned to equate the term with an installed package, which is only partly isolated from the host on which it runs, such that it shares some of its functionality with other packages. I would also highlight them as a compact packaging mechanism for installed applications, for which you do not have to take any additional steps to get them to run. In that sense, they are the ideal distribution format for a predictable environment, no matter how often you deploy them. Combine that with the small scale, and it’s no wonder containers are the platform of choice for microservices. “Function as a Service” platforms, popularized by Amazon with AWS Lambda, will often be built on top of a container platform for those reasons as well.

When Service-Oriented Architectures started to gather steam, we saw the emergence of a middle-layer platform called the Enterprise Service Bus, which grew from a solution for service-exposure, via service composition and adaption, to a full-blown service-oriented development environment. Thanks to the emphasis on high-productivity, service exposures tended to be solved completely within the confines of this intelligent interconnection layer, reducing the pressure to change the design of back-office systems. This again allowed applications to continue to grow beyond the point where refactoring was a reasonable approach, and we got what is often referred to as “a ball of mud.” Oversized applications with a thriving interconnection layer that plugs into any available interface cause complications when we need to adapt those applications to a changing world, so development speed is reduced to just a handful of releases per year.

“Breaking down the monolith” is the new motto, as we finally get driven home the message that gluttony is also a sin in application land. If we want to be able to change in step with our market, we need to increase our deployment speed, and just tacking on small incremental changes has proven to be a losing game. No, we need to reduce interdependencies, which ultimately also means we need to accept that too much intelligence in the interconnection layer worsens the problem rather than solving it, as it sprinkles business logic all over the architecture and keeps creating new dependencies. Martin Fowler phrased it as “Smart endpoints and dumb pipes”, and as we do this, we increase application components’ autonomy and you’ll notice the individual pieces can finally start to shrink. Microservices architecture is a consequence of an increasing drive towards business agility, and woe to those who try to reverse that relationship. Imposing Netflix’s architecture on your organization to kick-start a drive for Agile development can easily destroy your business.

Event-Driven Architectures and Containers

Now I will start by admitting that there is no single definition of “Event-Driven”, but the examples provided by Martin Fowler will do admirably. The underlying concept is that of an “Event”, which is nothing more or less than “an indication that something happened.” This may be a physical event, such as some measurement that was taken or a measurement that changed in a certain way, or it may contain state. The thing is, we’re sending it out without knowing for certain who, if anyone, is going to read it. Ah yes, that may sound a bit dangerous, but even messages in enterprise-grade middleware products have been known to … get lost. Sorry. There’s plenty of ways to fix that, but that is not the point here. The thing is that we’ve now decoupled the sender from any potential listeners, and that is the point we’re now discussing. CQRS and Event Sourcing (see my previous article) provide us with an easy way to support application component autonomy, which allows us to cut that monolith into pieces, and deploy them as separate processes.

So now that we can start decreasing the size of the individual components, we need a platform to support this, and here is where containers start to shine. Being quick to start and reusing common OS functionality without sacrificing isolation, containers are a good match for microservices architectures. Apart from that, because they don’t require a small component size, we can already start using them while we’re still dealing with big chunks. We can even deploy infrastructural components such as AxonServer using containers, especially if the application components needing them are also running on that platform.

Moving to Docker

To start our second part of the journey, we first need to get our test application into a container. The Maven pom for the AxonServer Quicktest (see the repository on GitHub) has a profile for building a Docker image using the Jib plugin, but you’ll have to adjust the configuration so it uses the tag and optional credentials helper you want to use. To make it easy, there is a public image named “axoniq/axonserver-quicktest:4.3-SNAPSHOT”, which contains the application built from the repository mentioned above. If started with the “axonserver” Spring profile, it will use the default settings of the AxonServer connector and try to contact AxonServer on “localhost:8124”. With environment variables we can change the profile and the hostname for AxonServer, so let’s start!

First up in Docker

The public image for AxonServer Standard Edition is named “axoniq/axonserver”, and the latest version at the time of writing this article is 4.3.5. Let’s start it and run the tester against it:

$ docker pull axoniq/axonserver
Using default tag: latest
latest: Pulling from axoniq/axonserver
Digest: sha256:a4cc72589b45cd3574b2183ed2dd0ed68c4fa0a70dec52edad26acb2a57bc643
Status: Image is up to date for axoniq/axonserver:latest
docker.io/axoniq/axonserver:latest
$ docker run -d --rm --name axonserver axoniq/axonserver
3ce917771960c5964cb792c0455614e37f64f4a9367a8456551036aef3509075
$ docker logs axonserver | tail -2
2020-06-18 13:49:17.565  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8024 (http) with context path ''
2020-06-18 13:49:17.569  INFO 1 --- [           main] io.axoniq.axonserver.AxonServer          : Started AxonServer in 7.779 seconds (JVM running for 8.254)
$ docker run --rm --link axonserver \
> --env AXON_AXONSERVER_SERVERS=axonserver \
> --env SPRING_PROFILES_ACTIVE=axonserver \
> axoniq/axonserver-quicktest:4.3-SNAPSHOT
2020-06-18 13:50:21.463  INFO 1 --- [mandProcessor-0] i.a.testing.quicktester.TestHandler      : handleCommand(): src = "QuickTesterApplication.getRunner", msg = "Hi there!".
2020-06-18 13:50:21.516  INFO 1 --- [.quicktester]-0] i.a.testing.quicktester.TestHandler      : handleEvent(): msg = "QuickTesterApplication.getRunner says: Hi there!".
$ docker stop axonserver
axonserver
$

What you see above is a check asking Docker to ensure we have the latest AxonServer image by explicitly pulling it, and then starting it in the background (“-d” or “--detach”) as a container named “axonserver”. The “--rm” option will mark this as a container that can be deleted immediately when it stops, which is just a bit of housekeeping that I do to prevent a long list of terminated containers. Those familiar with Docker will have noticed we did not expose any ports, which means I cannot access the REST API or Web UI directly, but any ports opened by the processes in the container will be accessible for other containers on Docker’s virtual network. The response from Docker is the full ID of the running container. After a few seconds, check the logs to see if everything is ready before we start the quick-test. If you’re too quick you won’t see the “Started AxonServer” line, so give it another try. For the quicktest app, apart from again using the automatic cleanup option, we also link it to the AxonServer container, which makes a hostname available to the tester with the same name as the link. The two environment variables provide the tester with the AxonServer hostname and enable the “axonserver” Spring profile. You can run it a few times and see the number of events increase, but we did that last time already, so it is not shown. Updating the container’s configuration so it exposes ports is unfortunately not possible, so let’s do that in the second run when we also consider storage. The last command stops the AxonServer container, which implicitly removes it. (This is what the “--rm” was for.)

Intermezzo: Docker Desktop and Windows 10

If we’re going to run stuff in Docker, we naturally need a fully functional environment, which brings up the possible challenges of not using Linux. For a long time, running Docker on macOS and Windows meant starting a VM with Linux, while the user manipulates this server (named the “docker-machine”) from the host. The integration of the host OS with the docker-machine has steadily improved, but on Windows, there was a complicating factor for users of the Unix command-line toolsets CygWin and MinGW, where the last one is often installed as part of Git-for-Windows. Both projects provide for a lot of Linux-standard tools to run on Windows, while still allowing you to run Windows EXE programs. They work fine for most use-cases, but problems appear when you have to specify file and directory locations, like “/cygdrive/c/TEMP”, “/c/TEMP” and “C:\TEMP”. These may intend to specify the same location, but the interpretation depends on which executable tries to use it. This may bite you when you are going to specify Docker volumes to share directories and files between containers and the host machine.

Docker worked with Microsoft to improve their Docker Desktop product, moving it to the Windows-native Hyper-V hypervisor for the docker-machine, which improved its architecture on Windows hosts a lot. However, with the introduction of the Windows Subsystem for Linux (WSL) the situation was further complicated, because the isolation of the WSL VM meant it had no direct access to the docker-machine other than through a (virtual) network connection. WSL 2 brought a lot of improvements in both integration and performance, and Docker started migrating its docker-machine to WSL 2. Due to all the moving parts (Windows 10 build 2004 includes support for WSL 2, but it needs to be installed. Docker Desktop can be configured to use either Hyper-V or WSL 2) we need to have clear what the examples are based on if you want to achieve the same results.

All examples in this article assume a fully functional Unix-like environment, which on Windows means Windows 10 with WSL 2, and Docker Desktop using the WSL 2 backend. Experiments with named Docker volumes for WSL 2 filesystems that failed half a year ago work perfectly since the release of build 2004. If you are not able to use WSL 2, I strongly suggest you translate paths to Windows drives+paths notation, and work from either CMD or PowerShell for the Docker commands. Please note that you must then explicitly designate which drives or paths are eligible for volumes in the Docker configuration panel.

Using Docker Volumes

The public AxonServer image is built using the Maven Jib plugin and runs it in the root directory under the “root” user. It uses a provided properties file with the following settings:

axoniq.axonserver.event.storage=/eventdata
axoniq.axonserver.snapshot.storage=/eventdata
axoniq.axonserver.controldb-path=/data
axoniq.axonserver.pid-file-location=/data
logging.file=/data/axonserver.log
logging.file.max-history=10
logging.file.max-size=10MB

As discussed before, Spring-boot also checks for a directory named “config” in the current working directory for additional settings, so we can adjust the AxonServer node name by adding a volume mapping for “/config” and putting our settings in an “axonserver.properties” file. Given that we are in the directory we want to use for the volumes, we can do the following:

$ mkdir data events config
$ (
> echo axoniq.axonserver.name=axonserver
> echo axoniq.axonserver.hostname=localhost
> ) > config/axonserver.properties
$ docker run -d --rm --name axonserver -p 8024:8024 -p 8124:8124 \
> -v `pwd`/axonserver/data:/data \
> -v `pwd`/axonserver/events:/eventdata \
> -v `pwd`/axonserver/config:/config \
> axoniq/axonserver
4397334283d6185506ad27a024fbae91c5d2918e1314d19fcaf2dc20b4e400cb
$ docker logs axonserver | tail -2
2020-06-19 13:06:42.072  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8024 (http) with context path ''
2020-06-19 13:06:42.075  INFO 1 --- [           main] io.axoniq.axonserver.AxonServer          : Started AxonServer in 7.623 seconds (JVM running for 8.21)
$ ls -lF *
config:
total 4
-rw-r--r-- 1 user user    71 Jun 19 15:04 axonserver.properties

data:
total 52
-rw-r--r-- 1 root root    14 Jun 19 15:06 AxonIQ.pid
-rw-r--r-- 1 root root 45056 Jun 19 15:06 axonserver-controldb.mv.db
-rw-r--r-- 1 root root  1150 Jun 19 15:06 axonserver.log

events:
total 4
drwxr-xr-x 2 root root 4096 Jun 19 15:06 default/
$ curl -s http://localhost:8024/v1/public/me | jq
{
  "authentication": false,
  "clustered": false,
  "ssl": false,
  "adminNode": true,
  "developmentMode": false,
  "storageContextNames": [
    "default"
  ],
  "contextNames": [
    "default"
  ],
  "name": "axonserver",
  "hostName": "localhost",
  "httpPort": 8024,
  "grpcPort": 8124,
  "internalHostName": null,
  "grpcInternalPort": 0
}
$ docker inspect axonserver | jq '.[].Config.Hostname'
"88cb0a15ef10"
$ 

Note that we again need to wait a few seconds after starting AxonServer. What you see next is that the “data” and “events” directories now contain files and directories created by AxonServer, and they’re owned by “root”. When done on Windows with an NTFS filesystem, the file sizes in the data directory may be shown as zero, due to a difference in the moment of synchronizing directory information as compared to Unix-like Operating Systems. When we query the “/v1/public/me” REST endpoint, it shows the properties have been picked up, as the hostname is set to “localhost”, even though the actual container’s hostname is “88cb0a15ef10”.

Deploying sets of Applications to Docker

If we want to deploy a group of applications together, we can extend the basic Docker setup a bit with e.g. a virtual Docker network to connect them all together, but we can also get some help to do that for us, based on a single descriptive YAML file with docker-compose. This does not use the “link” method shown above, but instead starts the containers with a hostname and places them in the same Docker network, so the containers can correctly use DNS lookups. The scenario where this shines is when you deploy several applications at once, such as a cluster of Axon Server EE nodes, but even for a singleton SE node, this can work well because docker-compose quite naturally pushes you towards named volumes and networks.

As an example, the following docker-compose file describes a comparable setup as from the previous section, with a read-only flag for the configuration added:

version: '3.3'
services:
  axonserver:
    image: axoniq/axonserver
    hostname: axonserver
    volumes:
      - type: bind
        source: ./data
        target: /data
      - type: bind
        source: ./events
        target: /eventdata
      - type: bind
        source: ./config
        target: /config
        read_only: true
    ports:
      - '8024:8024'
      - '8124:8124'
      - '8224:8224'
    networks:
      - axon-demo
networks:
  axon-demo:

The “better” approach is to use named volumes, where you replace the “volumes” section with:

  - axonserver-data:/data
  - axonserver-events:/eventdata
  - axonserver-config:/config:ro

The details are now in a separate, top-level “volumes” section, where we can use a more detailed specification of the volume settings, even though for a local bind this isn’t that useful:

volumes:
  axonserver-data:
    driver: local
    driver_opts:
      type: none
      device: ${PWD}/data
      o: bind
  axonserver-events:
    driver: local
    driver_opts:
      type: none
      device: ${PWD}/events
      o: bind
  axonserver-config:
    driver: local
    driver_opts:
      type: none
      device: ${PWD}/config
      o: bind

Axon Server Enterprise Edition in Docker

When we move to Docker, we need to do some preparatory work on Axon Server EE first, as there is no public image for the 4.3.x versions. A comparable image to that of SE is easy to make, but let’s add some more “enterprise” features, and create one that runs it under a non-root user. The Google “distroless” images used as a base for Jib generated images is nice in the sense that it contains as little as possible, but unfortunately, that also means it has no shell to run commands, nor utilities such as “adduser”. Luckily there is this cool feature called a “multi-stage build”, which allows us to use a container to run some commands, and then copy individual files from it, while starting with a fresh base image. In this case, that means we can use e.g. a “busybox” image to create a user with some initialized files and directories in its home directory, and then switch to the distroless base, copying the prepared stuff:

FROM busybox as source
RUN addgroup -S axonserver \
    && adduser -S -h /axonserver -D axonserver \
    && mkdir -p /axonserver/config /axonserver/data \
                /axonserver/eventdata /axonserver/log \
    && chown -R axonserver:axonserver /axonserver

FROM gcr.io/distroless/java:11

COPY --from=source /etc/passwd /etc/group /etc/
COPY --from=source --chown=axonserver /axonserver /axonserver

COPY --chown=axonserver axonserver.jar axonserver.properties \
     /axonserver/

USER axonserver
WORKDIR /axonserver

VOLUME [ "/axonserver/config", "/axonserver/data", \
         "/axonserver/eventdata", "/axonserver/log" ]
EXPOSE 8024/tcp 8124/tcp 8224/tcp

ENTRYPOINT [ "java", "-jar", "axonserver.jar" ]

If we build the image with this Dockerfile, you need the AxonServer EE JAR file and a common properties file. That last one has only small changes compared to the SE one:

axoniq.axonserver.event.storage=/axonserver/eventdata
axoniq.axonserver.snapshot.storage=/axonserver/eventdata
axoniq.axonserver.replication.log-storage-folder=/axonserver/log
axoniq.axonserver.controldb-path=/axonserver/data
axoniq.axonserver.pid-file-location=/axonserver/data

axoniq.axonserver.accesscontrol.systemtokenfile=/axonserver/config/axonserver.token

logging.file=/axonserver/data/axonserver.log
logging.file.max-history=10
logging.file.max-size=10MB

Compared to the SE image we now have an additional volume for the replication log, and AxonServer itself is running in a subdirectory named “/axonserver”.

Deploying AxonServer EE using docker-compose

Now that we have an image, we still need to add the license file and system token, and for docker-compose we can do this using so-called “secrets”. Please note that Docker itself also supports the notion of “secrets”, but then in the context of Docker Swarm. When you try to use them for the first time, e.g. with “docker secret ls” to see if there are any defined, Docker will probably complain that the host is not configured as a “swarm manager”. In this article we will not dive into Docker Swarm, but instead move to Kubernetes.

For our docker-compose example, we can add the secrets using a top-level “secrets” section:

secrets:
  axonserver-properties:
    file: ./axonserver.properties
  axoniq-license:
    file: ./axoniq.license
  axonserver-token:
    file: ./axonserver.token

These secrets (imported as a file) can now be added in the service definition, as shown in the service definition for the first node below:

axonserver-1:
    image: axonserver-ee:test
    hostname: axonserver-1
    volumes:
      - axonserver-data1:/axonserver/data
      - axonserver-events1:/axonserver/events
      - axonserver-log1:/axonserver/log
    secrets:
      - source: axoniq-license
        target: /axonserver/config/axoniq.license
      - source: axonserver-properties
        target: /axonserver/config/axonserver.properties
      - source: axonserver-token
        target: /axonserver/config/axonserver.token
    environment:
      - AXONIQ_LICENSE=/axonserver/config/axoniq.license
    ports:
      - '8024:8024'
      - '8124:8124'
      - '8224:8224'
    networks:
      - axon-demo

As you can see all three are manifested as a file in the “config” subdirectory. Also added is an environment variable for the license file. For the other nodes you can change the port mapping, so you can leave the configuration properties the same, but still access the nodes from outside Docker just like we did in the previous installment. 

From docker-compose to Kubernetes

Kubernetes is built on the base of Docker, providing you with a cluster of nodes to run containers in, and just like docker-compose, it uses YAML files to describe the desired state. Kubernetes itself is again used as a basis for several other projects, such as RedHat’s OpenShift and Pivotal's CloudFoundry, which add cluster management and common functionality. Given that you can use these platforms to deploy containers to a multi-node cluster, the first change they cause is prompted by the realization that you sometimes want to keep containers together. If you have two containers of an app tightly coupled to a database, the latency caused by a distributed deployment may not be what you want. Kubernetes uses as its basic unit the “Pod”, and all containers in a Pod are guaranteed to be deployed to the same node, and they have full access to each other’s exposed ports.

To take care of some common management tasks, Kubernetes allows you to add a controller, which can automatically restart Pods that have stopped for some reason. Kubernetes in fact encourages you to anticipate this behavior, as it is also used to migrate Pods to another node under certain conditions. Your container may unexpectedly be stopped and restarted on another node, which works fine as long as your app is able to behave as a stateless service. The controller can also have the capability to scale the Pod, resulting in multiple identical stateless services, that can share the load.

For AxonServer this behavior presents a problem, as we have already seen that it is essentially stateful, and we need volumes with potentially large amounts of data (the Event store) also tightly coupled to individual instances of AxonServer. Luckily several typical combinations of deployment units are available to us, from the “Deployment”, which combines a Pod with a ReplicaSet as its controller, to the “StatefulSet”, which makes the Pod names within the set predictable and adds support for persistent volumes. In practice, this means we can get an AxonServer node with a fixed hostname, coupled volumes, and the guarantee they stay together even in the event of a (forced) migration to another Kubernetes node in the cluster.

The Axon Quick Start package contains an example StatefulSet file, which defines two volumes for the “/eventdata” and “/data” directories exposed by the AxonServer SE container. However, rather than linking them to an actual disk or directory on a disk, they are linked to what Kubernetes calls PersistentVolumeClaims (PVC). A PVC claims a certain amount of disk space as a persistent volume, and if we don’t explicitly specify any disks to use, Kubernetes will create those persistent volumes for us. A StatefulSet includes a controller, which could make several copies of the Pod-plus-Volumes, so the PVC is effectively a template of a PVC, and the YAML file uses a section named “volumeClaimTemplates” to describe them:

 volumeClaimTemplates:
    - metadata:
        name: eventstore
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 5Gi
    - metadata:
        name: data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 1Gi

The names of these templates, “eventstore” and “data”, are used by the volume declarations of the Pods. If you make use of Amazon’s EKS, Google Cloud’s GKS, or Azure’s AKS Kubernetes services, you can use additional references in the “resources” sections shown above, to refer to an actual disk, so you can more easily manage it from the Console.

Using the Kubernetes Controller

As said, a StatefulSet includes a controller, which means Kubernetes will monitor the Pod to ensure a certain number of copies of it are running. If the application in the Pod were to stop running, the controller would immediately start a new instance, which could then be used to effectively restart your application; just delete the Pod, and the controller will start a fresh one. If the problem persists, the controller will note that too, and enter a state named “CrashLoopBackoff”. However, all this health monitoring requires it to know how to determine the “healthy state”, which is why we use a “readinessProbe” and a “livenessProbe”:

readinessProbe:
  httpGet:
    port: http
    path: /actuator/info
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 1
livenessProbe:
  httpGet:
    port: http
    path: /actuator/info
  initialDelaySeconds: 60
  periodSeconds: 5
  timeoutSeconds: 1

Both these probes use an HTTP GET request to the given port and path, and they have some settings for testing frequency and initial delay. The controller uses the first probe to determine startup has finished, and from then on uses the second probe to monitor health. For AxonServer we do not use the “/actuator/health” path, as that includes checks that may return warnings, while Kubernetes just wants a yes or no. Requesting the (textual) information on AxonServer is then enough because a valid reply implies that AxonServer is working and monitoring incoming requests.

Exposing AxonServer

Because Kubernetes can use multiple nodes to host Pods, and controllers can create multiple copies of the same, exposing the network ports is not simply a matter of mapping container ports to host ports. Instead, we use Services:

apiVersion: v1
kind: Service
metadata:
  name: axonserver-gui
  labels:
    app: axonserver
spec:
  ports:
  - name: gui
    port: 8024
    targetPort: 8024
  selector:
    app: axonserver
  type: LoadBalancer
  sessionAffinity: clientIP
---
apiVersion: v1
kind: Service
metadata:
  name: axonserver-grpc
  labels:
    app: axonserver
spec:
  ports:
  - name: grpc
    port: 8124
    targetPort: 8124
  clusterIP: None
  selector:
    app: axonserver

The first of these services exposes the HTTP port of AxonServer and tells Kubernetes it should set up a load-balancer for it. This means that if the controller has more than one Pod active, they will share the requests. In contrast, the gRPC port, which is used to let client applications connect, is using the default type “ClusterIP”, which makes it only accessible from within the cluster, but with a single fixed IP address. The actual address however is set to “None” because Axon client applications connect directly to a specific instance. This type of service is called a “Headless Service” in Kubernetes parlance, this causes the StatefulSet to be exposed in the cluster-internal DNS. The first replica has a Pod name equal to the name of the StatefulSet suffixed with “-0”, and the service name becomes a domain name, as a subdomain of the Kubernetes namespace. The example StatefulSet, when deployed in namespace “default”, will give you an AxonServer instance with as its DNS name “axonserver-0.axonserver” within the namespace, or “axonserver-0.axonserver.default.svc.cluster.local” for the whole cluster, and this name is made available as soon as the Pod has been declared ready by its probe.

Configuration Adjustments

Just as with docker-compose, we can use a configuration file and attach it to our StatefulSet, to provide AxonServer with extra settings. The difference is that, where docker-compose allowed us to “mount” multiple files in the same directory, Kubernetes only deals in directories. Configuration files can be collected in ConfigMaps, where the individual entries become files. Secrets are also available, and again their entries become individual files. You can specify a ConfigMap in YAML as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: axonserver-properties
data:
  axonserver.properties: |
    axoniq.axonserver.domain=axonserver.default.svc.cluster.local

If you want to create it from an existing file, you can also use:

$ kubectl create configmap axonserver-properties \
>   --from-file=axonserver.properties
configmap/axonserver-properties created
$

You can then mount this on the “/axonserver/config” directory.

Testing our deployment

With Docker, it is pretty simple to run a simple test container with our quick-tester, but for Kubernetes, this is no longer the case; there are just too many different things you can deploy, and we want to keep the command-line as clean and simple as possible. What I came up with is the following:

$ kubectl run axonserver-quicktest \
>   --image=axoniq/axonserver-quicktest:4.3-SNAPSHOT \
>   --env AXON_AXONSERVER_SERVERS=axonserver-0.axonserver \
>   --env SPRING_PROFILES_ACTIVE=axonserver \
>   --attach stdout --rm --generator=run-pod/v1

As you can see it refers to AxonServer using the “same-namespace” DNS name and is deployed to the default namespace. (none specified) The “--attach stdout” ensures we get to see the container’s output, and “--rm” cleans up the Pod. The generator bit is a recent requirement, but the one specified, “run-pod/v1”, used to be the default, and deploys the container as a simple Pod.

AxonServer EE in Kubernetes

For a simple EE cluster, where the configuration of each node is the same, the initial differences are fairly predictable, as we add a volume for the replication logs, secrets for the license file and system token, and an environment variable to point at the license. Note also that, because we cannot put the secrets in the “config” subdirectory (every secret is a directory!) we need a separate place for the system token. I advise you to look at the examples in the GitHub repository, which use a small script to generate the properties file and then create a ConfigMap and two Secrets. The template used for the properties is:

axoniq.axonserver.autocluster.first=axonserver-0.__SVC_NAME__.__NS_NAME__.svc.cluster.local
axoniq.axonserver.autocluster.contexts=_admin,default

axoniq.axonserver.domain=__SVC_NAME__.__NS_NAME__.svc.cluster.local
axoniq.axonserver.internal-domain=__SVC_NAME__.__NS_NAME__.svc.cluster.local

axoniq.axonserver.accesscontrol.enabled=true
axoniq.axonserver.accesscontrol.internal-token=2843a447-4da5-4b54-af27-7a8e0d857e87
axoniq.axonserver.accesscontrol.systemtokenfile=/axonserver/security/axonserver.token

This example can be deployed to a test namespace, which makes it easy to deploy. The script uses “sed” to replace the “__SVC_NAME__” and “__NS_NAME__” markers with the names of the Service and the namespace respectively.

Security Considerations: Who owns what?

As you saw when we made the image for AxonServer EE, it is running as user “axonserver”. This is definitely what we want, but it has a few unexpected consequences due to differences in the default behavior of plain Docker vs Kubernetes with respect to the ownership of mounted volumes. With plain Docker, a volume is automatically assigned to the “current” user of the container, while Kubernetes leaves ownership to root unless you explicitly choose a different security context. To further complicate things, Kubernetes does not want to make assumptions about the names of users and groups but rather forces you to use their respective ids.

On the StatefulSet side, we solve this by picking a number, and a common practice for Linux is that “normal” users have ids starting at 1000. We will adjust the specification as follows:

spec:
  securityContext:
    runAsUser: 1001
    fsGroup: 1001

Now we must ensure that these are the actual numbers used for the “axonserver” user and group, and we can do that by adding a parameter to the “adduser” and “addgroup” commands:

FROM busybox as source
RUN addgroup -S -g 1001 axonserver \
    && adduser -S -u 1001 -h /axonserver -D axonserver \
    && mkdir -p /axonserver/config /axonserver/data \
                /axonserver/events /axonserver/log \
    && chown -R axonserver:axonserver /axonserver

With this change to the Dockerfile, we are now ready to deploy.

Make it so!

Now we have all pieces in place, so we can do the actual deployment. Assuming you are running the Kubernetes cluster that comes with Docker Desktop:

$ kubectl create ns test-ee
namespace/test-ee created
$ ./create-secrets.sh axonserver test-ee
secret/axonserver-license created
secret/axonserver-token created
configmap/axonserver-properties created
$ kubectl apply -f axonserver-sts.yml -n test-ee
statefulset.apps/axonserver created
$ kubectl apply -f axonserver-svc.yml -n test-ee
service/axonserver-gui created
service/axonserver created
$ kubectl get all -n test-ee
NAME               READY   STATUS    RESTARTS   AGE
pod/axonserver-0   1/1     Running   0          94s

NAME                     TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/axonserver       ClusterIP      None             <none>        8124/TCP         27s
service/axonserver-gui   LoadBalancer   10.110.211.198   localhost     8024:31925/TCP   27s

NAME                          READY   AGE
statefulset.apps/axonserver   1/1     94s
$ curl -s http://localhost:8024/actuator/health | jq '.details.cluster.status'
"UP"
$ 

And that is all there is to it! We now have a single-node Axon Server EE cluster running, with the UI (and REST API) accessible from “localhost”. The fun of using the “autocluster” option is that it doesn’t matter how many nodes there are. Only a single one was started, and it initialized itself because its hostname and domain matched the “autocluster.first” value. We only specified volumeClaimTemplates, and they resulted in (so far) three actual claims and volumes:

$ kubectl get pv \
> -o custom-columns=NAME:.metadata.name,CLAIM:.spec.claimRef.name
NAME                                       CLAIM
pvc-9a5cf556-a645-4df1-86f7-243f72b61742   data-axonserver-0
pvc-c24112d6-2f3c-47c4-ab34-e1968d41b643   log-axonserver-0
pvc-e778090c-0ea9-407e-b849-81ead5aae668   events-axonserver-0
$

You can see the claim’s name uses the node’s hostname (“axonserver-0”), so even if we now delete the StatefulSet and recreate it, the claims and volumes will still be there and reused, and AxonServer will not need to initialize again.

Scaling to Three Nodes, and on the Need for an Ingress

Scaling this cluster is fun to watch in the UI, so open a browser to “http://localhost:8024”. You’ll get the login screen we already encountered in the previous installment. Create an admin user:

$ ../../axonserver-cli.jar register-user \
>   -t $(cat ../../axonserver.token) -u admin -p test -r ADMIN@_admin
$ 

Now log in and select the “Overview” tab. You should see the following:

Now keep this on-screen, without clicking anywhere, while you scale it to three nodes:

$ kubectl scale sts axonserver --replicas=3 -n test-ee
statefulset.apps/axonserver scaled
$ 

If your browser behaves well, you will eventually see something like this:

Note that the purple color and underlining may vary, depending respectively on which node is the leader of the “_admin” context (for the purple node name), and the “default” context (for the purple disk image), and who is serving the UI (the underlined name.) If you don’t see three nodes, or if you for example refresh the browser, you will get the login screen again, because we created a LoadBalancer without session affinity. AxonServer’s UI does not support the concept of a clustered session, so you have logged on to a single node, and when the LoadBalancer picks a different node for the UI’s REST calls, or the UI itself, you no longer have a valid session. Why this matters is that Docker Desktop does not support session affinity for LoadBalancer type services.

The way out of this situation is to use an Ingress, which requires a bit more work. For simple development scenarios, the “always LoadBalance to localhost for a single replica” approach works fine, but for this cluster, it does not. Luckily we can easily deploy a standard NGINX-based Ingress-controller, as described in the Kubernetes GitHub repository. You can use this same approach for both Docker Desktop on Windows 10 and macOS. After you follow the instructions on that page, you can switch from LoadBalancer to Ingress. You must first delete the gui service, as it has the wrong type. Then you can recreate the service and with it the new Ingress:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/cloud/deploy.yaml
… lots of stuff created … 
$ kubectl delete svc axonserver-gui -n test-ee
service "axonserver-gui" deleted
$ kubectl apply -f axonserver-ing.yml -n test-ee
service/axonserver-gui created
service/axonserver unchanged
ingress.networking.k8s.io/axonserver created
$ 

The ingress provided in the repository exposes the “axonserver-gui” service using hostname axonserver and the default HTTP port 80, so you should add an entry to your “hosts” file (“/etc/hosts” or “C:\Windows\System32\drivers\etc\hosts”) as an extra hostname for “127.0.0.1”. It also requests session affinity using a cookie, and with this in place, you can now use the UI without further problems.

As a final test, make sure you look at the overview page while you delete the second node:

$ kubectl delete pod axonserver-1 -n test-ee
pod "axonserver-1" deleted
$ 

On-screen you will see that “axonserver-1” is shown with a dotted outline:

If it was the leader of the “default” context, the purple color will immediately switch to a different node, showing that the other nodes held an election and (in the above case) leadership switched to “axonserver-2”. Also, if “axonserver-1” was serving the UI, you will see the Ingress switches as well, signified by the underlined node name. As soon as Kubernetes has started a new Pod for “axonserver-1”, and it finished starting up, it will read its configuration, check the stored events and replication log, and reconnect to the other nodes. For each context that the node is a member of, the leader will send it any changes it missed, before it is again shown as an active member of the cluster. Any client application that was actively using the cluster may have noticed something happened, but with the correct use of retry-policies, they should not have suffered from this mishap.

Wrap-up

In this installment, we looked at Docker and Kubernetes as a platform for AxonServer. Deployment turned out to be pretty simple, but, as always, the devil is in the details. If you are using a Kubernetes cluster from one of the major cloud providers there shouldn’t be many surprises, although the UI exposure for AxonServer Enterprise Edition will be easier with a working implementation of the “sessionAffinity” attribute. I have not delved into the TLS configuration, which is not essentially different from the local scenario, and Kubernetes has a specific form of the “kubectl” command for creating secrets from TLS certificates. Remember however that you’ll have the “svc.cluster.local” based domain for the cluster-internal communication while clients connecting from outside of the Kubernetes cluster will not be able to use that name. Given the focus on small and scalable components for Kubernetes as a platform however, it would be a bit odd to have AxonServer inside and clients outside.

The question now is - do we stop our review of AxonServer deployment here, since we seem to have found a preferential way to do this? The thing is that Kubernetes clusters also have their limits, and they come into play when we look at multi-regional deployments of AxonServer EE clusters. Currently, none of the “big three” cloud providers (EKS for AWS, AKS for Azure, and GKS for GCE) support multi-region Kubernetes clusters, although they all do support multi-zone deployments within a region. You could use multiple clusters and tie these together, but that requires quite a bit of networking magic to get it right.

Another point is that scaling AxonServer EE as we did here will always result in nodes with the exact same configuration, unless we use some trickery in a startup script using hostnames. Also, you cannot stop individual replicas; scaling up or down always leaves the lowest-numbered replicas running. Luckily we can solve that by creating several StatefulSets that each has only a single member, at the cost of a somewhat more irregular naming scheme.

The final alternative is to deploy using “full” VMs, which gives us both the most flexibility in deployment and the most work in preparation. This is what we will look at in our next and final installment.

About the Author

Bert Laverman is a Senior Software Architect and Developer at AxonIQ, with over 25 years of experience, the last years mainly with Java. He was co-founder of the Internet Access Foundation, a nonprofit organization which was instrumental in unlocking the north and east of the Netherlands to the Internet. Starting as developer in the nineties he moved to Software Architecture, with a short stint as strategic consultant and Enterprise Architect at an insurer. Now at AxonIQ, the startup behind the Axon Framework and Axon Server, he works on the development of its products, with a focus on Software Architecture and DevOps, as well as helping customers.

BT