BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Article Series: Developing Apache Kafka applications on Kubernetes

Article Series: Developing Apache Kafka applications on Kubernetes

How data is processed/consumed nowadays is different from how it was previously practiced. In the past, data was stored in a database and it was batch processed to get some analytics. Although this approach is correct, more modern platforms let you process data in real-time as data comes to the system.

Apache Kafka (or Kafka) is a distributed event store and stream-processing platform for storing, consuming, and processing data streams.

One of the key aspects of Apache Kafka is that it was created with scalability and fault-tolerance in mind, making it appropriate for high-performance applications. Kafka can be considered a replacement for some conventional messaging systems such as Java Message Service (JMS) and Advanced Message Queuing Protocol (AMQP).

Apache Kafka has integrations with most of the languages used these days, but in this article series, we’ll cover its integration with Java.

The Kafka Streams project helps you consume real-time streams of events as they are produced, apply any transformation, join streams, etc., and optionally write new data representations back to a topic.

Kafka Streams is ideal for both stateless and stateful streaming applications, implements time-based operations (for example grouping events around a given time period), and has in mind the scalability, reliability, and maintainability always present in the Kafka ecosystem.

But Apache Kafka is much more than an event store and a streaming-processing platform. It's a vast ecosystem of projects and tools that fits solving some of the problems we might find when developing microservices. One of these problems is the dual writes problem when data needs to be stored transactionally in two systems. Kafka Connect and Debezium are open-source projects for change data capture using the log scanner approach to avoid dual writes and communicate persisted data correctly between services.

In the last part of this series of articles, we'll see how to provision, configure and secure an Apache Kafka cluster on a Kubernetes cluster.

 

Series Contents

1

Getting Started to Quarkus Reactive Messaging with Apache Kafka

Apache Kafka is a stream-processing platform for storing, consuming, and processing data streams in real-time. In this post, we’ll learn how to produce and consume data using Kafka and Quarkus.

2

Kafka Streams and Quarkus: Real-Time Processing Events

The Kafka Streams project consumes real-time streams of events as they are produced, apply transformations, join streams, etc. In this article, we’ll learn how to use Kafka Streams and Quarkus. 

3

Debezium and Quarkus: Change Data Capture Patterns to Avoid Dual-Writes Problems

Debezium is an open-source project for change data capture using the log scanner approach to avoid dual writes and communicate persisted data correctly between services.

4

Moving Kafka and Debezium to Kubernetes Using Strimzi - the GitOps Way

Deploying an Apache Kafka cluster to a Kubernetes is not an easy task. There are a lot of pieces to configure. Strimzi is a Kubernetes controller making the deployment process of Kafka a child game.

5

Securing a Kafka Cluster in Kubernetes Using Strimzi

Deploying an Apache Kafka cluster to Kubernetes is easy if you use Strimzi, but that’s only the first step; you need to secure the communication between Kafka and all its components.

 

About the Author

Rate this Article

Adoption
Style

BT