InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Amazon SageMaker Now Offers Managed MLflow Capability for Enhanced Experiment Tracking

AI, ML & Data Engineering

Amazon SageMaker Now Offers Managed MLflow Capability for Enhanced Experiment Tracking

This item in japanese

Jul 12, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

AWS has announced the general availability of MLflow capability in Amazon SageMaker. MLflow is an open-source tool commonly used for managing ML experiments. Users can now compare model performance, parameters, and metrics across experiments in the MLflow UI, keep track of their best models in the MLflow Model Registry, automatically register them as a SageMaker model, and deploy registered models to SageMaker endpoints.

Amazon SageMaker Studio provides a fully integrated development environment (IDE) for machine learning, which allows users to create and manage tracking servers, run notebooks to create experiments and access the MLflow UI to view and compare experiment runs.

MLflow Tracking Server has three main components: compute, backend metadata storage, and artifact storage. The compute that hosts the tracking server and the backend metadata storage are securely hosted in the SageMaker service account. The artifact storage lives in an Amazon S3 bucket in the user's own AWS account. The tracking server has an ARN (Amazon Resource Names), which can connect the MLflow SDK to the Tracking Server and start logging training runs to MLflow.

When creating an MLflow Tracking Server, a backend store is automatically configured within the SageMaker service account and fully managed for the user. This backend store persists various metadata for each run, such as run ID, start and end times, parameters, and metrics. To provide MLflow with persistent storage for metadata for each run, users must create an artifact store using Amazon S3. The artifact store must be set up within the user's AWS account, and they must explicitly give MLflow access to Amazon S3 to access the artifact store.

Key benefits of using Amazon SageMaker with MLflow include comprehensive experiment tracking, full MLflow capabilities, unified model governance, efficient server management, enhanced security, and effective monitoring and governance.

As AWS chief evangelist Danilo Poccia wrote on his X account, this product can be used:

To help you track multiple model training runs, compare these runs with visualizations, evaluate models, and register the best models to a registry.

Eduardo Robledo, team member of AWS fundamentals, posted on X about the tracking server:

While it is a nice addition to the portfolio, I'm disappointed the tracking server is not serverless based; astonishing 460 USD per month for a small instance for the tracking server alone, is just crazy! companies like @InfinStor offers mlflow tracking server on-demand in aws lambda; i'm confident the team at aws could achieve the same.

Several tools like TensorBoard, Weights & Biases, and Neptune.ai offer capabilities for managing machine learning experiments, models, and deployments. TensorBoard is strong in visualizations for TensorFlow models, while Weights & Biases and Neptune.ai support various frameworks with extensive tracking and collaboration features. The integration of MLflow with Amazon SageMaker offers a managed and secure AWS environment, easy deployment to SageMaker endpoints, and strong model governance, providing an efficient solution for AWS users.

Amazon SageMaker with MLflow is available in all Amazon Web Services regions where Amazon SageMaker is currently available.

About the Author

Daniel Dominguez

Daniel is the Managing Partner at SamXLabs an AWS Partner Network company. He has over 13 years of experience in software product development for startups and Fortune 500 companies. Daniel holds a Machine Learning specialization from the University of Washington. He is passionate about leveraging AI and cloud computing to create innovative solutions. As an AWS Community Builder in the Machine Learning tier, Daniel is committed to sharing knowledge and driving innovation in software products.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Amazon SageMaker Now Offers Managed MLflow Capability for Enhanced Experiment Tracking

Write for InfoQ

About the Author

Daniel Dominguez

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter