At the recent re:Invent conference, Amazon Web Services (AWS) announced Amazon SageMaker Studio, an integrated development environment (IDE) for machine learning (ML) that brings code editing, training job tracking and tuning, and debugging all into a single web-based interface.
In a blog post, AI and ML evangelist Julien Simon gave an overview of the new service. Amazon SageMaker Studio integrates with several new AWS ML offerings, also announced at re:Invent, including Amazon SageMaker Notebooks, Amazon SageMaker Experiments, Amazon SageMaker Autopilot, Amazon SageMaker Debugger, and Amazon SageMaker Model Monitor. According to Simon:
[SageMaker Studio] gives developers the ability to make changes quickly, observe outcomes, and iterate faster, reducing the time to market for high quality ML solutions.
SageMaker Studio is based on JupyterLab, the next-generation interface from Project Jupyter. Project Jupyter's notebooks are one of the most common environments used by data scientists for exploring data and ML algorithms. SageMaker has long supported notebook instances, which require a user to log on to AWS and provision a virtual machine. The new offering promises to launch notebooks "in seconds" and supports sharing with multiple users by integrating with AWS's single-sign-on (SSO) services, allowing users to access notebooks hosted in AWS without requiring AWS-specific credentials.
SageMaker Studio includes an integration with the new SageMaker Experiments service, which is designed to help ML practitioners manage large numbers of related training jobs; this is a problem that arises when searching for hyperparameters that lead to the best-performing model. SageMaker introduced hyperparameter-tuning jobs in 2018; SageMaker Experiments provides an abstraction-layer by introducing two core concepts: a trial, which is a training job with a certain configuration and set of hyperparameters, and an experiment, which is a group of related trials. The SageMaker Studio integration allows for easy creation of new experiments, as well as the ability to visualize the results of trials by graphing metrics such as model accuracy. SageMaker Studio integrates with yet another new service, SageMaker Autopilot, which can automatically generate and run experiments given only a file containing a dataset. Autopilot runs data pre-processing and feature-engineering jobs to infer the best model architecture before running hyperparameter-tuning jobs to find the best fit of that model.
SageMaker Debugger is a new service that provides visibility into model training by recording the tensor data that represents the state of the model throughout the training lifecycle. Designed to help detect and troubleshoot training problems that might cause a training job to fail, such as vanishing gradients, Debugger supports the popular deep-learning frameworks TensorFlow, PyTorch, and MXNet, and also supports TensorFlow's TensorBoard format. SageMaker Studio provides visualization of data, such as loss curves, as well as inspection of debug logs.
The final Studio integration is with SageMaker Model Monitor, a new service that monitors the quality of ML models in production. SageMaker has always supported the rapid web deployment of models with an inference endpoint, which creates a web service that takes as inputs new data observations and outputs model predictions. The new service can monitor "data drift" by analyzing incoming data points to make sure that they follow historical trends. SageMaker Studio integrates with Model Monitor to provide visualization of data metrics and rules violations.
Amazon SageMaker was first announced at re:Invent in 2017 and has had some new features added since, including SageMaker Ground Truth for automated data labelling and SageMaker Neo for model deployment on edge devices. This year's re:Invent announcements prompted a user on Reddit to comment that:
Autopilot is...low-hanging fruit for a big company that specializes in self-service so that add makes sense. [SageMaker] Studio is a step toward a more traditional data science platform experience, one like Domino or Cloudera Data Science Workbench would give you. It'll be interesting to watch how that improves in the future. Model Monitor is a nice touch.
Amazon SageMaker Studio is available as a preview in the AWS US-East-2 (Ohio) region.