BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon Kinesis Analytics is Like SaaS for Big Data Analysis

Amazon Kinesis Analytics is Like SaaS for Big Data Analysis

This item in japanese

Amazon have released a new AWS service which brings Big Data streaming into a SaaS-like world, where you connect inputs and outputs with a SQL query, and don't spend time on code or infrastructure.

Cloud is the home for Big Data for enterprises who want to benefit from practically-instant, practically-infinite scale for both storage and compute. Big Data PaaS offerings based on Hadoop have been available for years, with HDInsight on Azure and Amazon Elastic MapReduce on AWS, but more significant advances are being made now in real-time stream processing. Event stream processing in the cloud is focused on a very simple, SaaS-like approach, which you can do in Azure - and now AWS.

Amazon Kinesis Analytics is available now and is the direct AWS rival for Azure Stream Analytics, which Microsoft released in 2015. Both services approach streaming analytics in the same way: hooking into data sources, specifying a target, and running a continuous query to populate the output. The query is where the analysis goes, and both products use SQL (or something very similar to SQL), making detailed analysis easy to achieve.

This is an interesting trend in the cloud - platform providers are leveraging their own experience with analytics at cloud scale, and working at abstracting away the complexity for end-users. The existing AWS Kinesis product is used by Amazon to provide fine-grained metrics for AWS customers. Ryan Waite, who was General Manager for Data Services when it launched, said: "this enables us to scale the metering service to new limits and give alerts in real time." It shifts the focus from 'we can host this for you', to 'we can do this for you'. In his blog post announcing Amazon Kinesis Analytics, Jeff Barr, Chief Evangelist at AWS calls out the ease-of-use factor:

You can focus on processing the data and extracting business value from it instead of wasting your time on infrastructure. You can build a powerful, end-to-end stream processing pipeline in 5 minutes without having to write anything more complex than a SQL query.

Kinesis Analytics uses a pipeline model, where an analytics application connects to a source, continuously runs a query and writes output to a destination. Sources can be a Kinesis Stream or a Kinesis Firehose, so you can funnel data from huge numbers of event producers into a single query. The SQL query can be as simple as a SELECT DISTINCT to tell you how many producers are sending events, or a more complex analysis with sliding windows. The destination can also be a Kinesis Stream or Firehose, so you could store aggregated data in a relational database, or raw data in Hadoop.

To achieve the same with IaaS or even PaaS, you would need a scalable ingestion queue like Kafka, a stream processing engine like Apache Storm or Spark Streaming, and a scalable destination like Elasticsearch. Those are all distributed, clustered systems which come with a lot of administration overhead, and the analysis part is a custom solution you need to code, test and deploy yourself. Kinesis Analytics is powered by SQLStream, so AWS is providing a managed solution for streaming analytics using ANSI-standard SQL.

The story is similar with Azure, where the equivalent StreamAnalytics service recently released an output connector for PowerBI, Microsoft's data visualization tool. Ryan CrawCour, Program Manager of Azure Stream Analytics compared the managed end-to-end analytics solution with the custom alternative:

Traditionally, if you wanted to build a system that was able to get you this sort of insight from your data and display this on a dashboard you would have to first ingest the data, process the data, store the data in a database somewhere and then write a custom application to continually poll this database and populate a customer dashboard you had to build yourself.

Real-time analytics and event streaming is currently the most active area of development for Big Data - in the data center as well as the cloud. This year's release of Spark 2.0 added DataFrame support for streaming sources; Apache NiFi, a Big Data processing and routing tool which supports streaming reached the 1.0 release milestone; and Hortonworks released DataFlow version 1.2, which is based on NiFi and focuses on streaming.

The Lambda Architecture in Big Data has long been the standard approach, storing all data permanently for batch processing, and pulling key data out for real-time visualization. The real-time aspect - the speed layer - has been lacking in managed options, compared to the batch layer. With Kinesis Analytics there's a new option for those looking for a cloud-based solution.

Rate this Article

Adoption
Style

BT