Amazon is announcing multiple capabilities for SageMaker, including expanded capabilities to better prepare and analyze data for machine learning, faster onboarding with automatic data import from local disk in SageMaker Canvas, and the testing of machine learning workflows in local environments for SageMaker Pipelines.
Business analysts can create reliable ML forecasts on their own with SageMaker Canvas, a visual point-and-click interface, without any prior machine learning knowledge or coding knowledge. SageMaker Canvas makes it simple to access and merge data from various sources, clean data automatically, and create ML models to get precise predictions with only a few clicks.
Users of SageMaker Canvas can import data from a number of sources, including local disk, Amazon S3, Amazon Redshift, and Snowflake. As of right now, users can upload datasets from their local disk directly to SageMaker Canvas without consulting their administrators because the necessary permissions are already enabled. When creating a domain, SageMaker can connect a cross-origin resource sharing (CORS) policy to the built-in Amazon S3 bucket for local file uploads by enabling the "Enable Canvas permissions" setting for administrators. If administrators don’t want domain users to upload local files automatically, they can choose to disable this feature. In all AWS areas where SageMaker Canvas is supported, faster onboarding with automatic data onboarding from local disk is now possible.
Additional features for data preparation and analysis in Amazon SageMaker Canvas include the ability to specify multiple sample sizes for datasets as well as the ability to replace missing values and outliers. With just a few clicks, SageMaker Canvas makes it simple to access and mix data from many sources, automatically clean data, and create ML models that produce precise predictions.
Finally using SageMaker Pipelines, you can create machine learning pipelines that directly integrate with SageMaker. SageMaker Pipelines now allows you to build and test pipelines on a local computer. With this release, you may locally check the compliance of your Sagemaker Pipelines scripts and parameters before executing them on SageMaker in the cloud.
The following steps are supported by Sagemaker Pipelines Local Mode: processing, training, transform, model, condition, and fail. You have the freedom to specify different entities in your machine learning workflow thanks to these stages. You may quickly and effectively troubleshoot script and pipeline definition issues by using Pipelines local mode. By upgrading the session, you may easily transition your workflows from local mode to Sagemaker's managed environment.
The new capabilities add to the range of data preparation capabilities and advanced data transformations supported by Amazon SageMaker.