InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Alibaba Releases Two Open-Weight Language Models for Math and Voice Chat

AI, ML & Data Engineering

Alibaba Releases Two Open-Weight Language Models for Math and Voice Chat

This item in japanese

Sep 03, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Alibaba released two open-weight language model families: Qwen2-Math, a series of LLMs tuned for solving mathematical problems; and Qwen2-Audio, a family of multi-modal LLMs that can accept voice or text input. Both families are based on Alibaba's Qwen2 LLM series, and all but the largest version of Qwen2-Math are available under the Apache 2.0 license.

Qwen2-Math is available in a base version and an instruction-tuned version, each with a choice of 1.5B, 7B, or 72B parameters. Because most benchmark datasets are available on the internet, Alibaba conducted decontamination on their training datasets to remove mathematical problem-solving benchmark examples. After pre-training, the instruction-tuned models were trained with both supervised fine-tuning and reinforcement learning. On the popular MATH benchmark, the largest model, Qwen2-Math-72B-Instruct, outperformed state-of-the-art commercial models including GPT-4o and Claude-3.5-Sonnet. According to Alibaba,

Given the current limitation of English-only support, we plan to release bilingual models that support both English and Chinese shortly, with the development of multilingual models also in the pipeline. Moreover, we will continue to enhance our models’ ability to solve complex and challenging mathematical problems.

Besides MATH, Alibaba evaluated Qwen2-Math on benchmarks and mathematics exams, such as GSM8K and AIME 2024. They found that Qwen2-Math-Instruct had better performance than other baseline models of comparable size, "particularly in the 1.5B and 7B models." The 72B parameter version achieved a score of 86.4 on the Chinese-language math exam benchmark CMATH, which Alibaba claims is a new high score. They also claim that it outperformed Claude, GPT-4, and Gemini on the AIME 2024 exam.

Alibaba published a technical report with more details on Qwen2-Audio. The model accepts both text and audio input, but can only output text. Depending on the type of audio input provided, the model can operate in two modes, Voice Chat or Audio Analysis. In Voice Chat mode, the input is a user's speech audio, and the model acts as a chatbot. In Audio Analysis mode, the model can answer questions about the content of audio input. For example, given a clip of music, the model can identify the tempo and key of the song.

Andrew Ng's newsletter The Batch covered Alibaba's release, saying:

Qwen2 delivered extraordinary performance with open weights, putting Alibaba on the map of [LLMs]. These specialized additions to the family push forward math performance and audio integration in AI while delivering state-of-the-art models into the hands of more developers. It’s thrilling to see models with open weights that outperform proprietary models. The white-hot competition between open and closed technology is good for everyone!

Users on Reddit discussed both model series. One user described Qwen2-Math-7B as "punching really high and hard for its size." Another user said of Qwen2-Audio:

It would be very interesting to try to synthesize audio output using this model. The audio encoder is almost identical to WhisperSpeech one. Although Qwen2 is using Whisper-large-v3 which would probably require retraining of the WhisperSpeech acoustic model. If successful, that would be basically equivalent to GPT4o advanced voice mode running locally.

The model files for Qwen2-Math and Qwen2-Audio can be downloaded from Huggingface.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Alibaba Releases Two Open-Weight Language Models for Math and Voice Chat

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter