InfoQ Homepage Generative AI Content on InfoQ
-
PayPal Adds GenAI Support with LLMs to Its Cosmos.AI MLOps Platform
PayPal extended its MLOps platform Cosmos.AI to support the development of generative AI applications using large language models (LLMs). The company incorporated support for vendor, open-source, and self-tuned LLMs and provided capabilities around retrieval-augmented generation (RAG), semantic caching, prompt management, orchestration, and AI application hosting.
-
University of Chinese Academy of Sciences Open-Sources Multimodal LLM LLaMA-Omni
Researchers at the University of Chinese Academy of Sciences (UCAS) recently open-sourced LLaMA-Omni, an LLM that can operate on both speech and text data. LLaMA-Omni is based on Meta's Llama-3.1-8B-Instruct LLM and outperforms similar baseline models while requiring less training data and compute.
-
Meta Unveils Movie Gen, a New AI Model for Video Generation
Meta has announced Movie Gen, a new AI model designed to create high-quality 1080p videos with synchronized audio. The system enables instruction-based video editing and allows for personalized content generation using user-supplied images.
-
Google Develops Voice Transfer AI for Restoring Voices
A team at Google Research developed a zero-shot voice transfer (VT) model that can be used to customize a text-to-speech (TTS) with a specific person's voice. This allows speakers who have lost their voice, for example from Parkinson's disease or ALS, to use a TTS device to replicate their original voice. The model also works across languages.
-
Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases
Uber created a unified platform for serving large language models (LLMs) from external vendors and self-hosted ones and opted to mirror OpenAI API to help with internal adoption. GenAI Gateway provides a consistent and efficient interface and serves over 60 distinct LLM use cases across many areas.
-
Study Shows AI Coding Assistant Improves Developer Productivity
Researchers from Microsoft, MIT, Princeton University, and the Wharton School of the University of Pennsylvania recently published a study that showed the use of GitHub Copilot increased developer productivity. The team conducted three separate randomized controlled trials (RCT) involving over 4,000 developers; the ones using Copilot achieved a 26% increase in productivity.
-
Stability AI Announces Integration of Top Text-to-Image Models with Amazon Bedrock
Stability AI has introduced three new text-to-image models to Amazon Bedrock: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These models focus on improving performance in multi-subject prompts, image quality, and typography. They are designed to generate high-quality visuals for various use cases in marketing, advertising, media, entertainment, retail, and more.
-
Apple Open-Sources Multimodal AI Model 4M-21
Researchers at Apple and the Swiss Federal Institute of Technology Lausanne (EPFL) have open-sourced 4M-21, a single any-to-any AI model that can handle 21 input and output modalities. 4M-21 performs well "out of the box" on several vision benchmarks and is available under the Apache 2.0 license.
-
Google Announces Game Simulation AI GameNGen
A research team from Google recently published a paper on GameNGen, a generative AI model that can simulate the video game Doom. GameNGen can simulate the game at 20 frames-per-second (FPS) and in human evaluations was preferred only slightly less often than the actual game.
-
HelixML Announces Helix 1.0 Release
HelixML has announced their Helix platform for Generative AI is production ready at version 1.0. Described as a "Private GenAI Stack," the platform provides an interface layer and applications that can be connected to a variety of LLMs. It can be used to prototype apps, starting with a laptop, with all components version controlled to ease subsequent deployment and scaling.
-
Alibaba Releases Two Open-Weight Language Models for Math and Voice Chat
Alibaba released two open-weight language model families: Qwen2-Math, a series of LLMs tuned for solving mathematical problems; and Qwen2-Audio, a family of multi-modal LLMs that can accept voice or text input. Both families are based on Alibaba's Qwen2 LLM series, and all but the largest version of Qwen2-Math are available under the Apache 2.0 license.
-
Apple Unveils Apple Foundation Models Powering Apple Intelligence
Apple published the details of their new Apple Foundation Models (AFM), a family of large language models (LLM) that power several features in their Apple Intelligence suite. AFM comes in two sizes: a 3B parameter on-device version and a larger cloud-based version.
-
LLMs and Agents as Team Enablers
Eric Naiburg and Birgitta Böckeler published articles on the benefits and challenges of using AI as a multiplier in dev teams. We report on their insights for scenarios such as simplifying the germane cognitive load of a domain, automating code migrations, and coaching scrum masters on team facilitation. We also cover Böckeler's experiments with using LLMs to onboard onto a complex project.
-
MariaDB Introduces Open-Source Vector Preview, Aiming to Become Default MySQL Option
With the release of MariaDB 11.6, the MariaDB Foundation has announced the public preview of Vector search for the open-source fork of the MySQL engine. Database experts and open-source advocates see vector support as an opportunity for MariaDB to lead the MySQL ecosystem, especially since Oracle reserves most new features for its enterprise editions only.
-
Amazon MemoryDB Provides Fastest Vector Search on AWS
AWS recently announced the general availability of vector search for Amazon MemoryDB, the managed in-memory database with Multi-AZ availability. The new capability provides ultra-low latency and the fastest vector search performance at the highest recall rates among vector databases on AWS.