InfoQ Homepage Large language models Content on InfoQ
-
Micro Metrics for LLM System Evaluation at QCon SF 2024
Denys Linkov's QCon San Francisco 2024 talk dissected the complexities of evaluating large language models (LLMs). He advocated for nuanced micro-metrics, robust observability, and alignment with business objectives to enhance model performance. Linkov’s insights highlight the need for multidimensional evaluation and actionable metrics that drive meaningful decisions.
-
Ai2 Launches OLMo 2, a Fully Open-Source Foundation Model
The Allen Institute for AI research team has introduced OLMo 2, a new family of open-source language models available in 7 billion (7B) and 13 billion (13B) parameter configurations. Trained on up to 5 trillion tokens, these models redefine training stability, adopting staged training processes, and incorporating diverse datasets.
-
How Slack Used an AI-Powered Hybrid Approach to Migrate from Enzyme to React Testing Library
Enzyme’s lack of support for React 18 made their existing unit tests unusable and jeopardized the foundational confidence they provided, Sergii Gorbachov said at QCon San Francisco. He showed how Slack migrated all Enzyme tests to React Testing Library (RTL) to ensure the continuity of their test coverage.
-
AISuite is a New Open Source Python Library Providing a Unified Cross-LLM API
Recently announced by Andrew Ng, aisuite aims to provide an OpenAI-like API around the most popular large language models (LLMs) currently available to make it easy for developers to try them out and compare results or switch from one LLM to another without having to change their code.
-
Physical Intelligence Unveils Robotics Foundation Model Pi-Zero
Physical Intelligence recently announced π0 (pi-zero), a general-purpose AI foundation model for robots. Pi-zero is based on a pre-trained vision-language model (VLM) and outperforms other baseline models in evaluations on five robot tasks.
-
AWS Reveals Multi-Agent Orchestrator Framework for Managing AI Agents
AWS has introduced Multi-Agent Orchestrator, a framework designed to manage multiple AI agents and handle complex conversational scenarios. The system routes queries to the most suitable agent, maintains context across interactions, and integrates seamlessly with a variety of deployment environments, including AWS Lambda, local setups, and other cloud platforms.
-
Microsoft Introduces Magentic-One, a Generalist Multi-Agent System
Microsoft has announced the release of Magentic-One, a new generalist multi-agent system designed to handle open-ended tasks involving web and file-based environments. This system aims to assist with complex, multi-step tasks across various domains, improving efficiency in activities such as software development, data analysis, and web navigation.
-
Epoch AI Unveils FrontierMath: A New Frontier in Testing AI's Mathematical Reasoning Capabilities
Epoch AI in collaboration with over 60 mathematicians from leading institutions worldwide has introduced FrontierMath, a new benchmark designed to evaluate AI systems' capabilities in advanced mathematical reasoning.
-
Mistral AI Releases Two Small Language Model Les Ministraux
Mistral AI recently released Ministral 3B and Ministral 8B, two small language models that are collectively called les Ministraux. The models are designed for local inference applications and outperform other comparably sized models on a range of LLM benchmarks.
-
QCon San Francisco 2024 Day 2: Shift-Left, GenAI, Engineering Productivity, Languages/Paradigms
The 18th annual QCon San Francisco conference was held at the Hyatt Regency San Francisco in San Francisco, California. This five-day event, organized by C4Media, consists of three days of presentations and two days of workshops. Day Two, scheduled on November 19th, 2024, included a keynote address by Lizzie Matusov and presentations from four conference tracks.
-
LLaVA-CoT Shows How to Achieve Structured, Autonomous Reasoning in Vision Language Models
Chinese researchers fine-tuned Llama-3.2-11B to improve its ability to solve multimodal reasoning problems by going beyond the direct-response or chain-of-thought (coT) approaches to reason step by step in a structured way. Named LLava-CoT, the new model outperforms its base model and proves better than larger models, including Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.
-
QCon SF: Mandy Gu on Using Generative AI for Productivity at Wealthsimple
Mandy Gu spoke at QCon SF 2024 about how Wealthsimple, a Canadian fintech company, uses Generative AI to improve productivity. Her talk focused on the development and evolution of their GenAI tool suite and how Wealthsimple crossed the "Trough of Disillusionment" to achieve productivity.
-
Meta Releases NotebookLlama: Open-Source PDF to Podcast Toolkit
Meta has released NotebookLlama, an open-source toolkit designed to convert PDF documents into podcasts, providing developers with a structured, accessible PDF-to-audio workflow. As an open-source alternative to Google’s NotebookLM, NotebookLlama guides users through a four-step process that converts PDF text into audio content.
-
Google Debuts OpenAI-compatible API for Gemini
In an effort to make it easier for developers who adopted OpenAI for their LLM-based solutions to switch to Gemini, Google has launched a new endpoint for its Gemini API that allows them to easily switch from one service to the other. The new endpoint is still in beta and provides only partial coverage of OpenAI capabilities.
-
Anthropic Releases New Claude Models and Computer Use Feature
Anthropic released two new models: Claude 3.5 Haiku and an improved version of Claude 3.5 Sonnet. They also released a new feature for Claude 3.5 Sonnet that allows the model to interact with a computer's GUI the same way a human user does.