Google recently presented MediaPipe graphs for browsers, enabled by WebAssembly and accelerated by the XNNPack ML Inference Library. As previously demonstrated on mobile (Android, iOS), MediaPipe graphs allow developers to build and run machine-learning (ML) pipelines, to achieve complex tasks.
MediaPipe graphs are visualized in a browser environment with MediaPipe Visualizer, a dedicated web application that allows developers to build a graph consisting of nodes representing different machine learning or other processing tasks. The following figure corresponds to the running of the MediaPipe face detection example in the visualizer.
Source: https://developers.googleblog.com/2020/01/mediapipe-on-web.html
As is apparent from the graph, the face detection application transforms input frames (input_frames_gpu
) into output frames (output_frames_gpu
) through a series of transformations including the conversion of incoming frames into image tensors (TfLiteTensor
), posterior processing by a TFLite model for face detection, and overlaying of annotations on the output video.
The visualized graph matches the facing text which contains a description of the nodes’ content and the expected processing to realize. The MediaPipe Visualizer will react in real-time to changes made within the editor in order to maintain the correspondence between text and graph. The configuration of the previous TFLite model is for instance as follows:
# Converts the transformed input image on GPU into an image tensor stored as a
# TfLiteTensor.
node {
calculator: "TfLiteConverterCalculator"
input_stream: "IMAGE:transformed_input_video_cpu"
output_stream: "TENSORS:image_tensor"
}
Google created four demos (Edge Detection, Face Detection, Hair Segmentation, and Hand Tracking) to be run in the browser.
The browser-enabled version of MediaPipe graphs is implemented by compiling the C++ source code to WebAssembly using Emscripten, and creating an API for all necessary communications back and forth between JavaScript and C++. Required demo assets (ML models and auxiliary text/data files) are packaged as individual binary data packages, to be loaded at runtime.
To optimize for performance, MediaPipe’s browser version leverages the GPU for image operations whenever possible, and resorts to the lightest (yet accurate) available ML models. The XNNPack ML Inference Library is additionally used in connection with the TensorflowLite inference calculator (TfLiteInferenceCalculator
), resulting in an estimated 2-3x speed gain in most of applications.
Google plans to improve MediaPipe’s browser version and give developers more control over template graphs and assets used in the MediaPipe model files. Developers are invited to follow the Google Developer twitter account.
MediaPipe is a cross-platform framework for mobile devices, workstations and servers, and supports GPU acceleration. MediaPipe is available under the MIT open source license. Contributions and feedback are welcome and may be provided via the GitHub project.