Hazelcast Jet, Hazelcast's streaming engine component based on Hazelcast IMDG 4.1.1, recently celebrated its four-year anniversary with the release of version 4.4. Besides the normal bug fixes and performance enhancements, this version ships with new features such as unified file connector and the the first beta version of the SQL interface. Other noteworthy additions are strict event order enforcement and improved delivery packaging.
InfoQ caught up with Scott McMahon, technical director of field engineering at Hazelcast, for a closer look at what this new release has to offer.
InfoQ: Thank you for taking the time to answer some questions for our readers. Can you introduce yourself and describe your day-to-day role and involvement at Hazelcast?
Scott McMahon: My name is Scott McMahon, I am a technical director of field engineering at Hazelcast and the technology lead for our partner ecosystem. My day-to-day role is to lead our Solution Architects team in working with our customers to ensure successful deployment of our platform and engaging with various partners, such as IBM and Intel, to enable their teams with the implementation of our platform in various customer solution deployments.
InfoQ: This February, Hazelcast Jet celebrates four years since it was publicly released. How has the product evolved since? What are the most important milestones in the development of the product?
McMahon: Hazelcast Jet, the streaming engine component of the Hazelcast Platform, has been amazing to watch as it transformed the capabilities and usage of our technology. Mobile devices, IoT sensors, even the cars we drive, all became constantly connected and started sending infinite, individual streams of event data.
We started building Jet as a response to the performance requirements of processing these data streams in as near real-time as possible. Streaming data quickly loses value over time. What seemed like a fairly specific use case in the beginning has evolved over time to include many processing capabilities that we didn’t initially foresee. The incorporation of Machine Learning (ML) models in the live stream processing has been a game changer for many of our customers. Easily operationalizing ML on production data has always been challenging; we’ve now built compatibility to load models in Python, Java, and C++ and continue to consider other languages as their usage increases.
Of course a lot of the work over time has been adding enterprise grade features as well, new "out of the box processing" capabilities, processing guarantees, and additional data source connectors are constantly being added. We think a new feature we’ve just added in the latest release, Streaming SQL, adds some capabilities that our user will find very appealing. Basically, you define an SQL query and attach it to an event stream and receive constant results as events flow through. Instead of sending the query to static data, you're sending dynamic data to the query. The stream processing world is really just getting started and we think our in-memory platform is uniquely capable of doing things that no others can do. It’s exciting for us.
InfoQ: What are the scenarios where Hazelcast Jet fits the best? When would you recommend using it? And when would you advise against using it?
McMahon: As our industry watched the transformation of how data is generated, it was obvious that the methods for processing this data would need to evolve as well. We knew our in-memory data grid, what we now think of as "data at rest", would play a major role, but we would need an "engine" on top of that to manage the complex and CPU intensive processing that was required for "data in motion", hence Jet was born. The scenario where Jet fits best is just what it was built for: infinite streams of data events, especially massively parallel streams, that need to be processed in as near real time as possible. In most streaming use cases, the event data’s value is directly related to time, meaning it loses value as it gets older. The ability to evaluate and act immediately can be the difference between a positive or negative outcome. So the use cases that fit best at the moment probably are mobile applications, user interaction data, we are seeing a lot of adoption in the edge/IoT space, and of course the financial industry.
As far as advising against using it, I would probably look at big batch processing jobs that don’t have a low latency requirement. Big number crunching analytics, especially traditional database jobs which join lots of tables and do column aggregations, that sort of thing. Though we have seen our platform used for that when higher performance is desired, like when you have a user interacting with an end system. It's really a specific decision and the platform is very flexible.
InfoQ: What else can you tell us about this release?
McMahon: We continue to enhance performance and add to the overall enterprise feature set. We’ve added a new file connector that simplifies and unifies reading file data from local sources as well as HDFS and cloud sources from AWS, GCP, and Azure. Out of the box it supports a variety of encoding formats such as Avro, JSON, etc. We continue to add core features, for example strict event ordering enforcement in this version. Our user community guides our development priority so we are constantly evaluating those requests.
InfoQ: One feature that might be of interest to the Java community is the SQL interface. How does that work as Hazelcast Jet integrates multiple data stream sources with various formats?
McMahon: Jet SQL is a very exciting feature for us and turns the SQL paradigm on its head a little bit. In essence, one can define and attach an SQL query at any stage of a stream process and have that query constantly applied to the data flowing through the process at that point. So if streams merge upstream, or there is enrichment or transformation, from where the query is applied, the query will be applied to the current resulting event at that point.
The way I think of it, instead of applying a query to a static data set, you can apply dynamic data to a query. To add some context, at the risk of getting a little down in the weeds, our stream processes are defined in the form of a directed acyclic graph, or DAG. This is a well known concept in streaming and simplifies the design process. Part of this design is that the DAG can be thought of as executing in stages as it progresses, this allows us to attach the query at the end of any stage in that DAG. We think this will open the platform to a wider audience as well as add more "self serve’"capabilities for analysts and operations staff.
InfoQ: In the blog post, Hazelcast states that the current release "simplifies the migration to hybrid cloud environments by adding an Amazon Kinesis connector for out-of-the-box integrations". Please elaborate on this.
McMahon: The Hazelcast mission is to support our customers and enable the highest performance with class leading resilience and reliability. It’s no secret that enterprises want to leverage multiple technologies and vendors to stay agile regarding their deployment options. Hazelcast is agnostic; we will run anywhere and provide a unified processing and storage layer across any infrastructure. AWS Kinesis, being Amazon's choice of message based integration among its applications, will be an integral part of an enterprise's hybrid or multi-cloud architecture and we felt that it was important to support it for our customers. Easy, native Kinesis support is just another addition in our strategy to support the cloud initiatives of our user base.
InfoQ: What does the future hold for Hazelcast Jet?
McMahon: Event driven processing, decoupled application integration, business process management, etc. are all ideas that have been around for quite some time. The limiting factor was always the technology to support them when under production workloads. Today’s technology however, from the incredible power packed into tiny CPUs, the speed and throughput of ubiquitous networking technologies like 5G, and Intel’s Optane memory enable not only those older ideas, but many new ones as well. We see stream based event processing continuing its growth and encountering new uses we haven’t contemplated yet. There are many features we’re looking at adding and feel that we’ve seen only the tip of the iceberg at this point.
Four years and four major versions later, Hazelcast Jet is now a more robust option in the stream processing space. The newly added features promise to make integration even more easy and Jet's processing capabilities even more robust.