AWS has announced the general availability of MLflow capability in Amazon SageMaker. MLflow is an open-source tool commonly used for managing ML experiments. Users can now compare model performance, parameters, and metrics across experiments in the MLflow UI, keep track of their best models in the MLflow Model Registry, automatically register them as a SageMaker model, and deploy registered models to SageMaker endpoints.
Amazon SageMaker Studio provides a fully integrated development environment (IDE) for machine learning, which allows users to create and manage tracking servers, run notebooks to create experiments and access the MLflow UI to view and compare experiment runs.
MLflow Tracking Server has three main components: compute, backend metadata storage, and artifact storage. The compute that hosts the tracking server and the backend metadata storage are securely hosted in the SageMaker service account. The artifact storage lives in an Amazon S3 bucket in the user's own AWS account. The tracking server has an ARN (Amazon Resource Names), which can connect the MLflow SDK to the Tracking Server and start logging training runs to MLflow.
When creating an MLflow Tracking Server, a backend store is automatically configured within the SageMaker service account and fully managed for the user. This backend store persists various metadata for each run, such as run ID, start and end times, parameters, and metrics. To provide MLflow with persistent storage for metadata for each run, users must create an artifact store using Amazon S3. The artifact store must be set up within the user's AWS account, and they must explicitly give MLflow access to Amazon S3 to access the artifact store.
Key benefits of using Amazon SageMaker with MLflow include comprehensive experiment tracking, full MLflow capabilities, unified model governance, efficient server management, enhanced security, and effective monitoring and governance.
As AWS chief evangelist Danilo Poccia wrote on his X account, this product can be used:
To help you track multiple model training runs, compare these runs with visualizations, evaluate models, and register the best models to a registry.
Eduardo Robledo, team member of AWS fundamentals, posted on X about the tracking server:
While it is a nice addition to the portfolio, I'm disappointed the tracking server is not serverless based; astonishing 460 USD per month for a small instance for the tracking server alone, is just crazy! companies like @InfinStor offers mlflow tracking server on-demand in aws lambda; i'm confident the team at aws could achieve the same.
Several tools like TensorBoard, Weights & Biases, and Neptune.ai offer capabilities for managing machine learning experiments, models, and deployments. TensorBoard is strong in visualizations for TensorFlow models, while Weights & Biases and Neptune.ai support various frameworks with extensive tracking and collaboration features. The integration of MLflow with Amazon SageMaker offers a managed and secure AWS environment, easy deployment to SageMaker endpoints, and strong model governance, providing an efficient solution for AWS users.
Amazon SageMaker with MLflow is available in all Amazon Web Services regions where Amazon SageMaker is currently available.