In a recent blog post, Amazon announced the general availability (GA) of Amazon Forecast, a fully managed, time series data forecasting service. Amazon Forecast uses deep learning from multiple datasets and algorithms to make predictions in the areas of product demand, travel demand, financial planning, SAP and Oracle supply chain planning and cloud computing usage. While Amazon Forecasts leverages machine learning within its service, users of the service do not require machine learning expertise.
Amazon Forecast was originally announced at re:Invent 2018 and is now available for production use via the AWS Console, AWS Command Line Interface (CLI) and AWS SDKs. The service was conceived as a result of customer demand, since Amazon has extensive experience in forecasting for their own lines of business. Andy Jassy, CEO of Amazon Web Services, explains:
Our customers said you do this at scale, and you’ve been doing it for a long time, you’ve built a lot of these models, can you find a way to make the models available to us?
The first step in creating an Amazon Forecast job, is to create a data set group. When creating the data set group, users can choose an existing forecasting domain that Amazon provides or create their own custom domain if one does not exist.
Amazon Forecast requires historical data to be loaded into the service for it to build predictions. This data is loaded into an Amazon S3 bucket and must be in CSV format. Other details need to be collected during this time, including the timestamp format and an IAM role which provides Amazon Forecast read-access to the S3 bucket.
The next step in the process is to train a predictor which allows the user to select a machine learning algorithm. Amazon provides five different options including Autoregressive Integrate Moving Average (ARIMA), Deep AR+, Exponential Smoothing (ETS), Non-Parametric Time Services (NPTS) and Prophet Algorithms. Alternatively, users can choose AutoML and Amazon will choose the best performing option algorithm for this dataset.
Like most machine learning solutions, data needs to be broken into two datasets: one to train with and the other for evaluation. Users should not just split their dataset randomly. Danilo Poccia, principal evangelist at Amazon Web Services, explains:
With time series, you can’t just create these two subsets of your data randomly, like you would normally do, because the order of your data points is important. The approach we use for Amazon Forecast is to split the time series in one or more parts, each one called a backtest window, preserving the order of the data. When evaluating your model against a backtest window, you should always use an evaluation dataset of the same length, otherwise it would be very difficult to compare different results.
Once the predictor has been configured and is active, Amazon will provide metrics that evaluate how effective the predictor is, including Quantile loss by calculating how far off the forecast is at a certain quantile from the actual demand.
Image source: https://aws.amazon.com/forecast/
When the forecast is run, it will compile results in multiple outputs including visualizations within the AWS Console, CSV exports and through the Amazon Forecast API. The outputs include probability forecasts for P10, P50 and P90 scenarios.
Image source: https://aws.amazon.com/blogs/aws/amazon-forecast-now-generally-available/
Based upon private beta customer benchmarks, Amazon Forecast is providing up to 50% more accurate forecasts to what people were previously doing on their own and at 1/10th of the cost of traditional supply chain software.
Amazon Forecast is now available in the following AWS regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo). Sample datasets and walk-throughs are available on GitHub.