The Google Cloud team recently announced the generally available (GA) release of AutoML Natural Language framework. AutoML Natural Language supports features for data processing and discovering insights from text data. It also supports common machine learning tasks like classification, sentiment analysis, and entity extraction. It's useful in the following type of applications:
- Categorizing digital content (such as news, blogs, or tweets) in real-time to allow the users to see patterns and insights.
- Identifying sentiment in customer feedback comments.
- Turning dark, unstructured scanned data into classified and searchable content. Gartner defines dark data as the information assets organizations collect, process and store during business activities, but fail to use for other purposes like analytics.
AutoML Natural Language works with a wide range of content such as collections of articles, scanned PDFs, or previously archived records. There are three steps in how the tool works:
- Upload the documents: In the first step, you upload the documents using the AutoML Natural Language UI and label the text data based on your domain-specific keywords and phrases.
- Train the custom model: The tool then runs the machine learning tasks to classify, extract and detect sentiment.
- Evaluate the model: The last step involves getting insights that are relevant to the specific needs of the users.
AutoML Natural Language also supports analyzing PDF documents, including native PDFs and PDFs of scanned images. To help with challenging use cases such as understanding legal documents or classification of complex content taxonomies, AutoML Natural Language supports 5,000 classification labels, training up to one million documents, and document sizes up to 10 MB.
Chicory, a digital shopping and marketing solution provider for the grocery industry, uses the PDF scanning functionality. According to Asaf Klibansky, director of engineering at Chicory:
We are using AutoML to classify and translate recipe ingredient data across a network of 1,300 recipe websites into actual grocery products that consumers can purchase seamlessly through our partnerships with dozens of leading grocery retailers like Kroger, Amazon, and Instacart.
AutoML Natural Language also has some advanced features to help with understanding the documents better. AutoML Text & Document Entity Extraction incorporates the spatial structure and layout information of a document for model training and prediction. This leads to better understanding of the entire document, and is especially valuable in cases where both the text and its location on the page are important, such as invoices, receipts, resumes, and contracts.
The product is FedRAMP-authorized at the Moderate level, making it easier for federal agencies to benefit from Google AI technology.
To learn more about AutoML Natural Language and the Natural Language API, check out their website, Get Started documentation and a Quick Start tutorial.
Other products in Google Cloud AutoML include AutoML Vision, AutoML Video Intelligence (beta), AutoML Translation, and AutoML Tables.