InfoQ Homepage Streaming Content on InfoQ
-
A Critique of Resizable Hash Tables: Riak Core & Random Slicing
This fall, Wallaroo Labs will be releasing a large new feature set to our distributed data stream processing framework, Wallaroo. One of the new features requires a size-adjustable, distributed data structure to support growing & shrinking of compute clusters. It might be a good idea to use a distributed hash table to support the new feature, but what distributed hash algorithm should we choose?
-
How to Choose a Stream Processor for Your App
Choosing a stream processor for your app can be challenging with many options to choose from. The best choice depends on individual use cases. In this article, the authors discuss a stream processor reference architecture, key features required by most streaming applications and optional features that can be selected based on specific use cases.
-
Democratizing Stream Processing with Apache Kafka and KSQL - Part 1
In this article, author Michael Noll discusses the stream processing with KSQL, the streaming SQL engine for Apache Kafka. Topics covered include challenges of stateful stream processing and how KSQL addresses them, and how KSQL helps to bridge the world of streams and databases through streams and tables.
-
Migrating Batch ETL to Stream Processing: A Netflix Case Study with Kafka and Flink
At QCon New York, Shriya Arora presented “Personalising Netflix with Streaming Datasets” and discussed the trials and tribulations of a recent migration of a Netflix data processing job from the traditional approach of batch-style ETL to stream processing using Apache Flink.
-
Exploring the Fundamentals of Stream Processing with the Dataflow Model and Apache Beam
At QCon San Francisco 2016, Frances Perry and Tyler Akidau presented “Fundamentals of Stream Processing with Apache Beam”, and discussed Google's Dataflow model and associated implementation of Apache Beam.
-
Is Batch ETL Dead, and is Apache Kafka the Future of Data Processing?
At QCon San Francisco 2016, Neha Narkhede presented “ETL is Dead; Long Live Streams”, and discussed the changing landscape of enterprise data processing. A core premise of the talk was that the open source Apache Kafka streaming platform can provide a flexible and uniform framework that supports modern requirements for data transformation and processing.
-
Processing Streaming Human Trajectories with WSO2 CEP
Extracting useful information from an inaccurate data stream is a significant issue in data stream processing for IoT applications. This article describes the use of Kalman filters to smooth human trajectory information gathered from an iBeacon sensor network and demonstrates its effectiveness. The solution has been built with WSO2 CEP, a complex event processing middleware.
-
Key Takeaway Points and Lessons Learned from QCon San Francisco 2016
The 10th annual QCon San Francisco was the biggest yet, bringing together over 1500 team leads, architects, project managers, and engineering directors. Over 125 practitioner-speakers presented 92 full-length technical sessions and 32 in-depth tutorials, providing deep insights into real-world architectures and state of the art software development practices from a practitioner’s perspective.
-
Traffic Data Monitoring Using IoT, Kafka and Spark Streaming
Internet of Things (IoT) is an emerging disruptive technology and becoming an increasing topic of interest. One of the areas of IoT application is the connected vehicles. In this article we'll use Apache Spark and Kafka technologies to analyse and process IoT connected vehicle's data and send the processed data to real time traffic monitoring dashboard.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
-
Big Data Processing with Apache Spark - Part 3: Spark Streaming
In this article, third installment of Apache Spark series, author Srini Penchikala discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application.
-
Storm Applied Review and Q&A with the Authors
Storm is a distributed, fault-tolerant, real-time computation system that was originally developed at BackType and later open sourced by Twitter. Storm Applied is a new book from Manning that aims to provide a practical guide on using Storm, both in a development and in a production setting. InfoQ has spoken with two of the book’s authors, Sean T. Allen and Matthew Jankowski.