Martin Leuthold, 8/23/2018

Unified deployment process with Kubernetes for a Kafka connector

THE CHALLENGE

Developing and deploying Kafka connectors involves different kinds of technologies and processes. Before our unified deployment process we had these four building blocks:


Locally we used either the Confluent package to run a single-node Kafka connect cluster to test our code or we were using a local Docker Swarm architecture, which included a dockerized Zookeeper, Broker and Kafka connect cluster. This was the first step in our process.

In our second step, when the code was in a mature state, we needed to deploy the code in the cloud. Our cloud infrastructure was either AWS ECS or a self-hosted Kubernetes cluster running on AWS resources. However, this step took us a lot of time to get right, because we had to test against "real" cloud resources. Some of the drawbacks were:

  • Cloud resources cost money.
  • Cloud resources cost time to create and delete.
  • To test on cloud resources in parallel, you need some kind of prefix-naming to don't interfere with other developers.

UNIFIED DEPLOYMENT

With the company support for Kubernetes and the decision within the team to use Kubernetes over AWS ECS, we strove for a full Kubernetes development environment which can be used locally and in the cloud. We therefore ditched Docker Swarm completely, because we didn't had enough resources to maintain it in conjunction with Kubernetes. In addition, going for Kubernetes allowed us to solve some minor issues we had with Kafka connector deployment in the past, e.g. where to put cron jobs? Our Kafka connector deployment consists of six components, which are shown on the picture below:

On the right hand side we provide the Kafka broker and zookeeper as PODs in our Kubernetes environment. These PODs are only available when developing and deploying locally. When the Kafka connector is deployed in the cloud the configuration will point to the existing Kafka cluster omitting the PODs for Kafka broker and zookeeper.

On the left hand side (from left to right) we provide all the PODs necessary to operate a Kafka connector:

  • The Kafka connect node, which holds the customized Kafka connector JAR and configuration files.
  • A task checker and restarter, which checks the status of the Kafka connector task and restarts it if necessary.
  • An agent to collect metrics (e.g. JMX) and logs of the Kafka connect node for monitoring and debugging purposes.
  • A one-time POD, which deploys the Kafka connect configuration to the Kafka connect node and "starts" the Kafka connector.

A detailed description for each POD is not part of this blog post, but could be the topic of another blog post (Leave a comment, if you want to hear more). We just wanted to illustrate how many components it takes to have a full deployment of a Kafka connector.

TECHNICAL IMPLEMENTATION

To get to the unified deployment process, we were using the project Minikube, the Confluent Docker images for the Broker, Zookeeper and the Kafka connector itself and a Makefile.

  • Minikube is a tool, which allows us to run a Kubernetes cluster locally in a virtual machine. This is great to create and test deployment code locally.
  • The Confluent Docker images have been integrated into our Kubernetes deployment scripts. They allow us to lift a Kafka cluster with Zookeeper, Broker, Schema Registry, Connect, etc. on our local Kubernetes cluster. We are able to create and destroy our Kafka cluster within seconds for development and testing purposes.
  • A Makefile holds all the commands to build all necessary images, deploy them on our Kubernetes cluster and in case undeploy them. Because the Makefile is standardized to hold the targets "build, deploy and undeploy", we can run these targets locally and in the cloud. The only difference is the context in which they are running.

On the diagram on the left hand side is the Makefile, which holds the targets "build, deploy and undeploy". In addition for local deployment the target "startMinikube" is provided to launch a local Kubernetes cluster.

For testing the deployment code these steps are required:

  • step #0: (only local) start the Kubernetes cluster
  • step #1: build the Docker images and push them to the Docker registry
  • step #2: deploy Kafka deployments, which relate on images provided by the Docker registry
  • step #3: (optional) undeploy Kafka deployments, which result in a clean Kubernetes cluster with no containers

SHORT DEVELOPMENT CYCLES

Having a local Kubernetes cluster with our own Kafka cluster allowed us to debug real world problem like network bandwidth limitations faster.

In one case we finished our development code and our deployment code and assumed our solution would work in the cloud the same way as it works locally. Unfortunately the connection between the Kafka connector and the Kafka brokers is much slower in the cloud than local, because locally we could avoid any network traffic. But in the cloud we stumbled upon timeout issues (classic!), when we were loading data from the Kafka connector to the Kafka broker. In the cloud the connection between the two were 10x times slower, causing the flushing of a single file to exceed a 5 minutes time limit and killing the Kafka connector task.

With our local Kubernetes cluster we were able to simulate the real world network limitation and test our code locally, resulting in faster optimization cycles for our Kafka connector configuration and a more resilient code to handle timeouts.

RESULT

We became able to unify all the different approaches to Kubernetes and also became able to develop and test the deployment code locally.

Developing the deployment code locally saved us a lot of time, i.e. the feedback cycles were much shorter:

  • Setting up a clean Kafka cluster was a matter of seconds by calling "make undeploy" and "make deploy".
  • Network limitations for pushing hundreds of MBs from the Kafka connector to the local Kafka cluster were completely avoided, because all resources were locally (and in memory as well).
  • Sharing deployment code between team members and trying it locally was just a matter of getting Minikube run on each local machine.

A side effect was that our way of deployment became cloud agnostic, i.e. we could easily switch to Google Cloud or Microsoft Azure platform as long as a Kubernetes cluster is present there.

For the future we want to use Kubernetes extensively for all our Kafka connectors but also Kafka streams. Also we would like to migrate all our existing Kafka connectors from AWS ECS to Kubernetes.

If you are interested in this topic or want to participate in any way feel free to take a look at our job offerings and apply for team Lambda. We are hiring!