InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

InfoQ Homepage News Stream All the Things: Patterns of Effective Data Stream Processing Explored by Adi Polak at QCon SF

Architecture & Design

Stream All the Things: Patterns of Effective Data Stream Processing Explored by Adi Polak at QCon SF

This item in japanese

Nov 29, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Adi Polak, Director of Advocacy and Developer Experience Engineering at Confluent, presented "Stream All the Things—Patterns of Effective Data Stream Processing" at the latest QCon San Francisco. Polak's talk highlighted the persistent challenges of data streaming and unveiled pragmatic solutions that can aid organizations in managing scalable and efficient data streaming pipelines.

Despite a decade of technological advancements, data streaming has long posed significant challenges for organizations. Teams often spend up to 80% of their efforts troubleshooting issues like downstream output errors or suboptimal pipeline performance. Polak outlined the core expectations for an ideal data streaming solution: reliability, compatibility with diverse systems, low latency, scalability, and high-quality data.

However, meeting these demands requires tackling key challenges, including throughput, real-time processing, data integrity, and error handling. The presentation focused on advanced aspects like exactly-once semantics, join operations, and ensuring data integrity while adapting infrastructures for AI-driven applications.

Polak introduced several design patterns that address the complexities of data streaming pipelines. These include Dead Letter Queues (DLQ) for error management and patterns for ensuring exactly-once processing across systems.

Exactly-Once Semantics

Achieving exactly-once semantics remains a cornerstone of reliable data processing. Polak contrasted legacy Lambda architectures with modern Kappa architectures, which more deterministically handle real-time events, state, and time. She explained implementing exactly-once guarantees through two-phase commit protocols using tools like Apache Kafka and Apache Flink. Operators perform pre-commits, followed by a system-wide commit, ensuring consistency even if individual components fail. Windows-based time calculations (e.g., tumbling, sliding, and session windows) further enhance deterministic processing.

Join Operations

Joining data streams—either between stream-batch combinations or two real-time streams—is complex. Polak emphasized the need for precise planning to ensure seamless integration and exactly-once semantics during joins.

Error Handling and Data Integrity

Data integrity was highlighted as critical for trustworthy pipelines. Polak introduced the concept of "guarding the gates," which includes schema validation, versioning, and serialization using a schema registry. Such measures ensure physical, logical, and referential integrity, preventing "bad things from happening to good data." Pluggable failure enrichers, like automated error-processing tools integrated with Jira, were showcased as solutions for labeling and systematically resolving errors.

Polak concluded by exploring the growing intersection of data streaming with AI-driven use cases. Whether powering fraud detection, dynamic personalization, or real-time optimization, the success of AI systems hinges on robust, real-time data infrastructures. She underscored the importance of designing pipelines supporting AI applications' high throughput and low-latency demands.

Lastly, Polak left the audience with essential insights for effective data streaming:

Prioritize data quality and implement DLQ for error management.
Ensure exactly-once guarantees across the system using robust architectures.
Plan rigorously for join operations, which are inherently challenging.
Healthy error handling begins with clear labeling and systematic resolution.

About the Author

Steef-Jan Wiggers

Steef-Jan Wiggers is one of InfoQ's senior cloud editors and works as an Principal Consultant Cloud/DevOps at Team Rockstars IT in The Netherlands. His current technical expertise focuses on integration platform implementations, Azure DevOps, AI and Azure Platform Solution Architectures. Steef-Jan is a regular speaker at conferences and user groups and writes for InfoQ. Furthermore, Microsoft has recognized him as Microsoft Azure MVP for the past fifteen years.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Stream All the Things: Patterns of Effective Data Stream Processing Explored by Adi Polak at QCon SF

Write for InfoQ

About the Author

Steef-Jan Wiggers

This content is in the QCon San Francisco 2024 topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter