Microsoft announced the release of ML.NET 2.0, the open-source machine learning framework for .NET. The release contains several updated natural language processing (NLP) APIs, including Tokenizers, Text Classification, and Sentence Similarity, as well as improved automated ML (AutoML) features.
Program manager Luis Quintanilla announced the release at the recent .NET Conf 2022. The updated NLP APIs are powered by TorchSharp, a .NET wrapper for the popular PyTorch deep learning framework. The release includes the EnglishRoberta tokenization model and a TorchSharp implementation of NAS-BERT, which is used by the Text Classification and Sentence Similarity APIs. Updates to AutoML include an API for automated data pre-processing and a set of APIs for running experiments to find the best models and hyperparameters. Quintanilla also announced a new release of the Model Builder tool for Visual Studio, which includes the new text classification scenario and advanced training options.
The Text Classification API, which was previewed earlier this year, is based on the NAS-BERT model published by Microsoft Research in 2021. This model was developed using neural architecture search (NAS), resulting in smaller models than the standard BERT model, while maintaining accuracy. Users can fine-tune the pre-trained NAS-BERT model with their own data, to fit their custom use cases. The Sentence Similarity API uses the same pre-trained model, but instead of being fine-tuned to classify an input string, the model takes two strings as input and outputs a score indicating the similarity of the meaning of the two inputs.
The AutoML APIs are based on Microsoft's Fast Library for Automated Machine Learning & Tuning (FLAML). While the Featurizer API is designed for pre-processing, the rest of the APIs work together to search for the best set of hyperparameters. The Experiment API coordinates the optimization of a Sweepable pipeline over a Search Space using a Tuner. Devs can use the Sweepable API to define the training pipeline for hyperparameter optimization of their models; the Search Space API for configuring the range of the hyperparameter search space for that pipeline; and the Tuner API to choose a search algorithm for that space. The release includes several tuner algorithms, including basic grid and random searches as well as Bayesian and Frugal optimizers.
Quintanilla also gave viewers a preview of the ML.NET roadmap. Future plans for deep learning features include new scenarios and APIs for question answering, named-entity recognition, and object detection. There are also plans for TorchSharp integrations for custom scenarios and improvements to the ONNX integration. Other plans include upgrades to the LightGBM implementation and to the implementation of the IDataView interface, as well as improvements to the AutoML API.
At the end of his presentation, Quintanilla answered questions from the audience. One viewer asked about support for different vendors' GPUs and accelerator libraries, and Quintanilla noted that currently only NVIDIA's CUDA accelerator is supported. When another viewer asked whether ML.NET's object detection algorithms would run fast enough to support a live video stream, Quintanilla replied:
We want to focus on performance. We're introducing new deep learning scenarios and we realized that performance is key there, so performance is a focus for us going forward.
The ML.NET source code is available on GitHub.