InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill

Architecture & Design

Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill

Jul 16, 2024 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

The Apache Kafka community is actively working on enabling queue-like use cases for a popular messaging platform as part of the ongoing KIP-932 (Kafka Improvement Proposal). The proposal introduces a share group abstraction for cooperative message consumption. Meanwhile, SoftwareMill created an alternative solution that can work with the existing consumer group abstraction.

Apache Kafka has been a de facto standard for messaging solutions for many years, primarily because of its high-performance, durable message delivery. Kafka's design is geared towards supporting high throughput by limiting communication overheads and data transformations. High performance comes at the cost of flexibility, so some processing use cases, such as individual message delivery acknowledgments, are not supported.

Apache Kafka implements message consumption using consumer groups and relies on the exclusive assignment of topic partitions to consumers in the consumer group and partition offset tracking. This design introduces a head-of-line blocking problem, where slow or blocked processing of a single message can impact or even stall the consumer application. Additionally, the parallelism of message processing is limited by the number of partitions, so this number has to be carefully determined upfront, considering the desired throughput.

Share Groups in Apache Kafka (Source: SoftwareMill Blog)

KIP-932: Queues For Kafka proposal outlines a new approach toward cooperative message consumption by introducing a new abstraction of share groups. Share groups use a different partition assignment approach where the assignment is not exclusive, which removes the limitation for the number of consumers compared to the number of partitions. It also simplifies and shortens the rebalancing process and avoids the "stop the world" fencing events.

Additionally, with share groups, consumers can individually process and acknowledge messages, and Kafka can internally track message consumption at a much more granular level. When asked for available messages, Kafka share-partition returns a batch of messages marked as acquired, and these remain in that state until the consumer acknowledges them or the processing time limit is hit. If the latter happens, messages will be marked as available again.

Kafka also tracks the number of delivery attempts and marks the message as rejected if the number of delivery attempts exceeds the threshold. For now, the Dead Letter Queue (DLQ) functionality is not available to capture undelivered messages, but it may be added in the future. Kafka brokers maintain and persist all the internal state required to track message delivery at the individual level in a dedicated internal topic. Share groups functionality is planned to be included in the Kafka 4.0 release.

For the impatient who do not want to wait for KIP-932 to be shipped in Kafka 4.0, there is an alternative option that works with current and past Kafka versions. Adam Warski, chief R&D officer at SoftwareMill, introduces the solution:

KMQ is a pattern for implementing individual message acknowledgements, using the functionality provided by Kafka's consumer groups. Hence, it might be run using any Kafka version. You might notice some similarities in KMQ's and share groups' designs. However, KMQ is significantly simpler to implement and deploy, which also means it addresses only some of the shortcomings of consumer groups. KMQ is entirely a consumer-side mechanism, as opposed to significant logic being run as part of share group brokers.

KMQ uses an additional component, the Redelivery Tracker, that is responsible for marking the messages as delivered and acknowledged, similar to share groups but happening on the application side. The tracker relies on a dedicated "markers" topic and a separate consumer group and takes care of republishing messages back to the topic it's tracking in case of processing timeouts. If the redelivery counter, which is a part of the internal state maintained for each message, exceeds the configured threshold, the message is published into the configured Dead Letter Queue (DLQ) topic.

KMQ Pattern Design (Source: SoftwareMill Blog)

Since the KMQ solution needs to publish markers into the dedicated topic, it introduces some additional latency but can achieve throughput similar to that of regular consumer groups in performance benchmarks conducted by the project team. The KMQ pattern's main benefit is that it supports individual message acknowledgment with currently available Kafka client versions while offering high performance. However, the pattern doesn't address the head-of-line blocking issue, and the parallelism is still limited by the number of partitions.

About the Author

Rafal Gancarz

Rafał is an experienced technology leader and expert. He's currently helping Starbucks make its Commerce Platform scalable, resilient and cost-effective. Previously, Rafał has been involved in designing and building large-scale, distributed and cloud-based systems for Cisco, Accenture, Capita, ICE, Callsign and others. His interests span architecture & design, continuous delivery, observability and operability, as well as sociotechnical and organisational aspects of software delivery.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill

Write for InfoQ

About the Author

Rafal Gancarz

This content is in the Event Driven Architecture topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter