InfoQ Homepage Generative AI Content on InfoQ
-
Amazon SageMaker JumpStart Expands Portfolio with Bria AI's Text-to-Image Models
Amazon Web Services has integrated Bria AI's latest text-to-image foundation models into Amazon SageMaker JumpStart, marking a significant expansion of its enterprise-grade generative AI capabilities. The addition includes three variants - Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast, each designed to address specific enterprise needs in visual content generation.
-
Stable Diffusion 3.5 Improves Text Rendering, Image Quality, Consistency, and More
Stability AI has released Stable Diffusion 3.5 Large, its most powerful text-to-image generation model to date, and Stable Diffusion 3.5 Large Turbo, with special emphasis on customizability, efficiency, and flexibility. Both models come with a free licensing model for non commercial and limited commercial use.
-
AI and ML Tracks at QCon San Francisco 2024 – a Deep Dive into GenAI & Practical Applications
At QCon San Francisco 2024, explore two AI/ML-focused tracks highlighting real-world applications and innovations. Learn from industry experts on deploying LLMs, GenAI, and recommendation systems, gaining practical strategies for integrating AI into software development.
-
University Researchers Publish Analysis of Chain-of-Thought Reasoning in LLMs
Researchers from Princeton University and Yale University published a case study of Chain-of-Thought (CoT) reasoning in LLMs which shows evidence of both memorization and true reasoning. They also found that CoT can work even when examples given in the prompt are incorrect.
-
Google Publishes LLM Self-Correction Algorithm SCoRe
Researchers at Google DeepMind recently published a paper on Self-Correction via Reinforcement Learning (SCoRe), a technique for improving LLMs' ability to self-correct when solving math or coding problems. Models fine-tuned with SCoRe achieve improved performance on several benchmarks compared to baseline models.
-
PayPal Adds GenAI Support with LLMs to Its Cosmos.AI MLOps Platform
PayPal extended its MLOps platform Cosmos.AI to support the development of generative AI applications using large language models (LLMs). The company incorporated support for vendor, open-source, and self-tuned LLMs and provided capabilities around retrieval-augmented generation (RAG), semantic caching, prompt management, orchestration, and AI application hosting.
-
University of Chinese Academy of Sciences Open-Sources Multimodal LLM LLaMA-Omni
Researchers at the University of Chinese Academy of Sciences (UCAS) recently open-sourced LLaMA-Omni, an LLM that can operate on both speech and text data. LLaMA-Omni is based on Meta's Llama-3.1-8B-Instruct LLM and outperforms similar baseline models while requiring less training data and compute.
-
Meta Unveils Movie Gen, a New AI Model for Video Generation
Meta has announced Movie Gen, a new AI model designed to create high-quality 1080p videos with synchronized audio. The system enables instruction-based video editing and allows for personalized content generation using user-supplied images.
-
Google Develops Voice Transfer AI for Restoring Voices
A team at Google Research developed a zero-shot voice transfer (VT) model that can be used to customize a text-to-speech (TTS) with a specific person's voice. This allows speakers who have lost their voice, for example from Parkinson's disease or ALS, to use a TTS device to replicate their original voice. The model also works across languages.
-
Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases
Uber created a unified platform for serving large language models (LLMs) from external vendors and self-hosted ones and opted to mirror OpenAI API to help with internal adoption. GenAI Gateway provides a consistent and efficient interface and serves over 60 distinct LLM use cases across many areas.
-
Study Shows AI Coding Assistant Improves Developer Productivity
Researchers from Microsoft, MIT, Princeton University, and the Wharton School of the University of Pennsylvania recently published a study that showed the use of GitHub Copilot increased developer productivity. The team conducted three separate randomized controlled trials (RCT) involving over 4,000 developers; the ones using Copilot achieved a 26% increase in productivity.
-
Stability AI Announces Integration of Top Text-to-Image Models with Amazon Bedrock
Stability AI has introduced three new text-to-image models to Amazon Bedrock: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These models focus on improving performance in multi-subject prompts, image quality, and typography. They are designed to generate high-quality visuals for various use cases in marketing, advertising, media, entertainment, retail, and more.
-
Apple Open-Sources Multimodal AI Model 4M-21
Researchers at Apple and the Swiss Federal Institute of Technology Lausanne (EPFL) have open-sourced 4M-21, a single any-to-any AI model that can handle 21 input and output modalities. 4M-21 performs well "out of the box" on several vision benchmarks and is available under the Apache 2.0 license.
-
Google Announces Game Simulation AI GameNGen
A research team from Google recently published a paper on GameNGen, a generative AI model that can simulate the video game Doom. GameNGen can simulate the game at 20 frames-per-second (FPS) and in human evaluations was preferred only slightly less often than the actual game.
-
HelixML Announces Helix 1.0 Release
HelixML has announced their Helix platform for Generative AI is production ready at version 1.0. Described as a "Private GenAI Stack," the platform provides an interface layer and applications that can be connected to a variety of LLMs. It can be used to prototype apps, starting with a laptop, with all components version controlled to ease subsequent deployment and scaling.