InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage Computer Vision Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Meta Open-Sources Computer Vision Foundation Model DINOv2

Meta AI Research open-sourced DINOv2, a foundation model for computer vision (CV) tasks. DINOv2 is pretrained on a curated dataset of 142M images and can be used as a backbone for several tasks, including image classification, video action recognition, semantic segmentation, and depth estimation.

Anthony Alford
on May 23, 2023
AI, ML & Data Engineering

HuggingGPT: Leveraging LLMs to Solve Complex AI Tasks with Hugging Face Models

A recent paper by researchers at Zhejiang University and Microsoft Research Asia explores the use of large language models (LLMs) as a controller to manage existing AI models available in communities like Hugging Face.

Sergio De Simone
on Apr 13, 2023
AI, ML & Data Engineering

Carnegie Mellon Researchers Develop AI Model for Human Detection via WiFi

Researchers from the Human Sensing Laboratory at Carnegie Mellon University (CMU) have published a paper on DensePose From WiFi, an AI model which can detect the pose of multiple humans in a room using only the signals from WiFi transmitters. In experiments on real-world data, the algorithm achieves an average precision of 87.2 at the 50% IOU threshold.

Anthony Alford
on Feb 14, 2023
Cloud

Microsoft Brings Its Cloud Services and AI to the Edge

Microsoft recently announced the open-source release of Azure DeepStream Accelerator (ADA) in collaboration with Neal Analytics and NVIDIA, allowing developers to build Edge AI solutions with native Azure Services integration quickly.

Steef-Jan Wiggers
on Dec 29, 2022
AI, ML & Data Engineering

Salesforce Open-Sources Language-Vision AI Toolkit LAVIS

Salesforce Research recently open-sourced LAnguage-VISion (LAVIS), a unified library for deep-learning language-vision research. LAVIS supports more than 10 language-vision tasks on 20 public datasets and includes pre-trained model weights for over 30 fine-tuned models.

Anthony Alford
on Nov 15, 2022
Cloud

Microsoft Introduces New UI Experience for Trying out Computer Vision with Vision Studio

Microsoft recently introduced a new User Interface (UI) for developers called Vision Studio to try its Computer Vision API.

Steef-Jan Wiggers
on Nov 05, 2022
Cloud

Microsoft Previews Computer Vision Image Analysis API 4.0

Recently Microsoft announced the public preview of a new version of the Computer Vision Image Analysis API, making all visual image features ranging from Optical Character Recognition (OCR) to object detection available through a single endpoint.

Steef-Jan Wiggers
on Oct 31, 2022
AI, ML & Data Engineering

Meta Announces Video Generation AI Model Make-a-Video

Meta AI recently announced Make-A-Video, a text-to-video generation AI model. Make-A-Video is trained using publicly available image-text pairs and video-only data and achieves state-of-the-art performance on the UCF-101 video-generation benchmark.

Anthony Alford
on Oct 25, 2022
AI, ML & Data Engineering

Microsoft Trains Two Billion Parameter Vision-Language AI Model BEiT-3

Researchers from Microsoft's Natural Language Computing (NLC) group announced the latest version of Bidirectional Encoder representation from Image Transformers: BEiT-3, a 1.9B parameter vision-language AI model. BEiT-3 models images as another language and achieves state-of-the-art performance on a wide range of downstream tasks.

Anthony Alford
on Sep 27, 2022
AI, ML & Data Engineering

Stability AI Open-Sources Image Generation Model Stable Diffusion

Stability AI released the pre-trained model weights for Stable Diffusion, a text-to-image AI model, to the general public. Given a text prompt, Stable Diffusion can generate photorealistic 512x512 pixel images depicting the scene described in the prompt.

Anthony Alford
on Sep 06, 2022
Mobile

Google Releases CameraX 1.2 Beta with MLKit Integration

Now available in beta, CameraX 1.2 brings out-of-the-box integration with some of MLKit vision APIs and a new feature aimed to reduce shutter button lag when taking pictures.

Sergio De Simone
on Aug 25, 2022
AI, ML & Data Engineering

Google AI Open-Sourced a New ML Tool for Conceptual and Subjective Queries over Images

Google AI open-sourced mood board search, a new ML-powered tool for subjective or conceptual queries over images. Mood board search helps users to define conceptual and subjective queries like peaceful, beautiful, over images.

Reza Rahimi
on Jul 29, 2022
AI, ML & Data Engineering

Google's Image-Text AI LIMoE Outperforms CLIP on ImageNet Benchmark

Researchers at Google Brain recently trained Language-Image Mixture of Experts (LIMoE), a 5.6B parameter image-text AI model. In zero-shot learning experiments on ImageNet, LIMoE outperforms CLIP and performs comparably to state-of-the-art models while using fewer compute resources.

Anthony Alford
on Jul 19, 2022
AI, ML & Data Engineering

Adobe Researchers Open-Source Image Captioning AI CLIP-S

Researchers from Adobe and the University of North Carolina (UNC) have open-sourced CLIP-S, an image-captioning AI model that produces fine-grained descriptions of images. In evaluations with captions generated by other models, human judges preferred those generated by CLIP-S a majority of the time.

Anthony Alford
on Jul 05, 2022
AI, ML & Data Engineering

DeepMind Trains 80 Billion Parameter AI Vision-Language Model Flamingo

DeepMind recently trained Flamingo, an 80B parameter vision-language model (VLM) AI. Flamingo combines separately pre-trained vision and language models and outperforms all other few-shot learning models on 16 vision-language benchmarks. Flamingo can also chat with users, answering questions about input images and videos.

Anthony Alford
on Jun 21, 2022

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News