AWS introduced preview releases of Amazon HealthLake service and a feature for Amazon Redshift called Redshift ML during re:Invent 2020 in December.
Amazon HealthLake is a data lake service that helps healthcare, health insurance, and pharmaceutical companies to derive value out of their data with the help of NLP (Natural Language Processing). Redshift ML is a service that provides a gateway into SageMaker to Redshift users. Using Redshift ML, you can deploy ML models by just using SQL statements.
Healthcare data mostly comprises unstructured data in clinical notes, X-rays, CT scans, handwritten prescriptions, insurance claims, etc. HealthLake processes the unstructured text data using NLP and images using ML models like binary classification, multiclass classification, or regression.
HealthLake indexes all the information to be searched later, putting it in open standard formats — like the FHIR mandated format. Now, you have a complete view of an individual patient’s history to a level of granularity where now you can apply advanced analytics or predict a bunch of interesting things with new machine learning models to all that data, not just a subset of it.
In addition to using the CLI or the AWS console, you can also use AWS's SDK for Python or Java to create and use HealthLake resources.
Once the unstructured and spread-out data are converted to structured and centralized information, you can use tools like QuickSight and SageMaker to create visualizations and make predictions. HealthLake also provides bulk import into S3 and export from S3 in FHIR mandated format using the AWS console and the HealthLake API. It also fully supports FHIR search functionality.
Deriving value from unstructured medical data is laborious than it seems. Shraddha Goled, in her post, states:
What makes things more difficult is that it is often siloed, incompatible, stored across different on-premise locations… The cost and the complexity of the whole process of analyzing and discovering trends to make precise predictions prohibits organizations to fully utilize the potential of data.
One of the other potential problems for healthcare data is the possibility of misuse of medical data by companies. HealthLake is HIPAA-eligible, though it is worth noting that HealthLake’s HIPAA eligibility is not the same as HIPAA compliance.
This new service is available as a preview only in the US (N.Virgina) region. Pricing information can be found on the official website for AWS HealthLake. Google Cloud's Healthcare API and Microsoft's Azure API for Health partly offer the same functionality as AWS HealthLake.
Redshift ML integrates SageMaker Autopilot with Redshift giving the developers an option to write their ML models using SQL statements. That helps ML engineers and makes it much easier to create ML models for people who are not from an ML background.
Using Redshift, you can export your data to S3 to be used by SageMaker Autopilot for training. SageMaker Autopilot then finds the right model algorithm and tunes the hyperparameters to get the best results. You can refer to this example of predicting customer churn using Redshift ML.
While the service is in preview, you need to make sure that you create your Redshift cluster in the SQL_PREVIEW maintenance track and that all the related resources Redshift ML will use are created in the same AWS region.
This service is similar to Aurora ML, which was released to integrate AWS's Aurora database service with SageMaker.
In other cloud platforms, Microsoft's Azure Synapse and Google Cloud's BigQuery ML offer similar functionality.