The latest release of Intel OpenVINO offers a cleaner API, expands support for natural language processing, and improves performance and portability thanks to its new AUTO plugin. InfoQ has spoken with senior director AI Intel OpenVINO Matthew Formica to learn more.
In conversation with InfoQ, senior director AI Intel OpenVINO Matthew Formica explained that the new OpenVINO API aims to make it simpler for developers to adopt and maintain code by better aligning with TensorFlow conventions. Furthermore, a few frameworks have been updated.
Updates to the OpenVINO training extensions and Neural Network Compression Framework (NNCF) allow for optional model training templates that provide additional performance enhancements with preserved accuracy for action recognition, image classification, speech recognition, question answering and translation, and more.
As mentioned, OpenVINO also extends its NLP capabilities through a deep learning technology called "dynamic shapes", which is used to process variable length input data. According to Formica, dynamic shapes are especially important for NLP models working with text and audio data.
To make it easier for developers to ensure they get the kind of performance they require, OpenVINO introduces a new code "hints" mechanism so they can get the fastest immediate inference response (latency) or the highest total batch processing (throughput) using one line of code. OpenVINO will automatically optimize your model at runtime based on the provided hint.
Another new feature that will be particularly exciting for developers, says Formica, is the new AUTO plugin, which is able to dynamically and transparently allocate inferencing to all auto-discovered compute.
For instance, it will automatically look across your hardware system at all of the accelerators, the compute that is available, and how much memory is in the system. It will automatically load balance and distribute the inferencing AI job across all that compute for you, dynamically, so it’s working at the best performance. The big win for developers is they can ship one copy of their product and know that OpenVINO will optimize for whatever hardware is used.
This could greatly simplify developers' life by making it unnecessary to manage distinct "code branches in their applications to handle each hardware vendor’s unique HW config/box setup for best inferencing results." One notable example of this is provided by GPUs' "load/compile time" delay, which is the time it takes for a GPU to begin inferencing and is tightly related to its architecture. Thanks to AUTO, OpenVINO can start inferencing on the CPU while the GPU is getting ready, then transparently "hot swap" to the GPU for greater performance.
Intel OpenVINO 2022.1 Gold version will become available later during Q1 on www.openvino.ai, but interested developers can download the (at the moment) latest preview build to try it out.