ML.NET version 3.0 has been officially released, introducing new features and enhancements. Notably, deep learning capabilities have been significantly expanded with advancements in Object Detection, Named Entity Recognition (NER), and Question Answering (QA).
As reported, this expansion is made possible through integrations and interoperability with TorchSharp and ONNX models. Additionally, the integration with LightGBM has been updated to the latest version.
For readers who are not familiar with it, ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers.
In the domain of data processing scenarios, the 3.0 version brings about substantial improvements with an extensive list of enhancements and bug fixes to DataFrame. Furthermore, as reported new IDataView interoperability features enhance the critical steps of loading, inspecting, transforming, and visualizing data, providing a more robust experience.
Object Detection, a computer vision problem, is one of the key focuses of this release. It performs image classification at a granular scale, both locating and categorizing entities within images. Usage of object detection extends to areas such as workplace safety, object counting, activity recognition, robotics, and self-driving cars. Object Detection is included in the Microsoft.ML.TorchSharp 3.0.0 package within the Microsoft.ML.TorchSharp and Microsoft.ML.TorchSharp.AutoFormerV2 namespaces.
Regarding Natural Language Processing (NLP), ML.NET 3.0 introduces advancements in Named Entity Recognition (NER) and Question Answering (QA). These advancements build upon the existing TorchSharp RoBERTa text classification features previously introduced in ML.NET 2.0. Both NER and QA trainers are included in the Microsoft.ML.TorchSharp 3.0.0 package and the Microsoft.ML.TorchSharp namespace.
Intel oneDAL Training Acceleration, previously introduced shortly after ML.NET 2.0, enhances training hardware acceleration through Intel oneDAL (Intel oneAPI Data Analytics Library). This library accelerates data analysis by providing optimized algorithmic building blocks for data analytics and machine learning processes, leveraging SIMD extensions in 64-bit architectures featured in Intel and AMD CPUs. This feature is now available in the stable 3.0 version.
Automated Machine Learning (AutoML) experiences receive significant enhancements in ML.NET 3.0. The AutoML Sweeper now supports Sentence Similarity, QA, and Object Detection. Noteworthy community contributions include the implementation of a sampling key column name and expanded capabilities of the AutoZero tuner.
DataFrame, an essential component, undergoes substantial updates in ML.NET 3.0, enhancing IDataView <-> DataFrame conversions. Support for both String and VBuffer column types is added, enabling greater flexibility. Data loading scenarios are expanded, allowing data import-export from and to SQL databases using ADO.NET.
Also, it is stated that this release also relaxes constraints on column ordering and handles comma-separated data with duplicate column names.
Furthermore, Tensor Primitives Integration, a technical implementation detail, brings notable performance improvements through support for tensor operations. While this integration does not impact the public surface area of ML.NET, it contributes to performance gains and serves as a testing ground for System.Numerics.Tensors APIs.
Regarding the future, as reported the team is already planning for .NET 9 and ML.NET 4.0. In the near future, it is planned to have Model Builder and the ML.NET CLI updated to consume the ML.NET 3.0 release. The commitment to expanding deep learning scenarios, DataFrame enhancements, and System.Numerics.Tensors API integrations remain steadfast.
For readers who are interested in reading more, the full list of updates is available in the official release notes.