InfoQ Homepage Computer Vision Content on InfoQ
-
Apple Extends Core ML, Create ML, and Vision Frameworks for iOS 17
At its recent WWDC 2023 developer conference, Apple presented a number of extensions and updates to its machine learning and vision ecosystem, including updates to its Core ML framework, new features for the Create ML modeling tool, and new vision APIs for image segmentation, animal body pose detection, and 3D human body pose.
-
Voxel51 Open-Sources Computer Vision Dataset Assistant VoxelGPT - Q&A with Jason Corso
Voxel51 recently open-sourced VoxelGPT, an AI assistant that interfaces with GPT-3.5 to produce Python code for querying computer vision datasets. InfoQ spoke with Jason Corso, co-founder and CSO of Voxel51, who shared their lessons and insights gained while developing VoxelGPT.
-
Meta Open-Sources Computer Vision Foundation Model DINOv2
Meta AI Research open-sourced DINOv2, a foundation model for computer vision (CV) tasks. DINOv2 is pretrained on a curated dataset of 142M images and can be used as a backbone for several tasks, including image classification, video action recognition, semantic segmentation, and depth estimation.
-
HuggingGPT: Leveraging LLMs to Solve Complex AI Tasks with Hugging Face Models
A recent paper by researchers at Zhejiang University and Microsoft Research Asia explores the use of large language models (LLMs) as a controller to manage existing AI models available in communities like Hugging Face.
-
Carnegie Mellon Researchers Develop AI Model for Human Detection via WiFi
Researchers from the Human Sensing Laboratory at Carnegie Mellon University (CMU) have published a paper on DensePose From WiFi, an AI model which can detect the pose of multiple humans in a room using only the signals from WiFi transmitters. In experiments on real-world data, the algorithm achieves an average precision of 87.2 at the 50% IOU threshold.
-
Microsoft Brings Its Cloud Services and AI to the Edge
Microsoft recently announced the open-source release of Azure DeepStream Accelerator (ADA) in collaboration with Neal Analytics and NVIDIA, allowing developers to build Edge AI solutions with native Azure Services integration quickly.
-
Salesforce Open-Sources Language-Vision AI Toolkit LAVIS
Salesforce Research recently open-sourced LAnguage-VISion (LAVIS), a unified library for deep-learning language-vision research. LAVIS supports more than 10 language-vision tasks on 20 public datasets and includes pre-trained model weights for over 30 fine-tuned models.
-
Microsoft Introduces New UI Experience for Trying out Computer Vision with Vision Studio
Microsoft recently introduced a new User Interface (UI) for developers called Vision Studio to try its Computer Vision API.
-
Microsoft Previews Computer Vision Image Analysis API 4.0
Recently Microsoft announced the public preview of a new version of the Computer Vision Image Analysis API, making all visual image features ranging from Optical Character Recognition (OCR) to object detection available through a single endpoint.
-
Meta Announces Video Generation AI Model Make-a-Video
Meta AI recently announced Make-A-Video, a text-to-video generation AI model. Make-A-Video is trained using publicly available image-text pairs and video-only data and achieves state-of-the-art performance on the UCF-101 video-generation benchmark.
-
Microsoft Trains Two Billion Parameter Vision-Language AI Model BEiT-3
Researchers from Microsoft's Natural Language Computing (NLC) group announced the latest version of Bidirectional Encoder representation from Image Transformers: BEiT-3, a 1.9B parameter vision-language AI model. BEiT-3 models images as another language and achieves state-of-the-art performance on a wide range of downstream tasks.
-
Stability AI Open-Sources Image Generation Model Stable Diffusion
Stability AI released the pre-trained model weights for Stable Diffusion, a text-to-image AI model, to the general public. Given a text prompt, Stable Diffusion can generate photorealistic 512x512 pixel images depicting the scene described in the prompt.
-
Google Releases CameraX 1.2 Beta with MLKit Integration
Now available in beta, CameraX 1.2 brings out-of-the-box integration with some of MLKit vision APIs and a new feature aimed to reduce shutter button lag when taking pictures.
-
Google AI Open-Sourced a New ML Tool for Conceptual and Subjective Queries over Images
Google AI open-sourced mood board search, a new ML-powered tool for subjective or conceptual queries over images. Mood board search helps users to define conceptual and subjective queries like peaceful, beautiful, over images.
-
Google's Image-Text AI LIMoE Outperforms CLIP on ImageNet Benchmark
Researchers at Google Brain recently trained Language-Image Mixture of Experts (LIMoE), a 5.6B parameter image-text AI model. In zero-shot learning experiments on ImageNet, LIMoE outperforms CLIP and performs comparably to state-of-the-art models while using fewer compute resources.