InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News University of Washington AI-Powered Headphones Let Users Listen to a Single Person in a Crowd

AI, ML & Data Engineering

University of Washington AI-Powered Headphones Let Users Listen to a Single Person in a Crowd

This item in japanese

May 31, 2024 2 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Send your article proposal

"Target speech hearing" is a new deep-learning algorithm developed at the University of Washington to allow users to "enroll" a speaker and cancel all environmental noise surrounding their voice.

Currently, the system requires the person wearing the headphones to tap a button while gazing at someone talking or just look at them for three to five seconds. This directs a deep learning model to learn the speaker's vocal patterns and latch to it so it can play it back to the listener even as they move around and stop looking at that person.

A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker.

The key during the enrollment step is that the wearer is looking in the direction of the speaker, so their voice is aligned across the two binaural microphones while other interfering speakers are likely not aligned. This example is used to train a neural network with the characteristics of the target speaker and extract the corresponding embedding vector. This is then used with a different neural network to extract the target speech from a cacophony of speakers.

According to the researchers, this constitutes a significant step forward compared to existing noise-canceling headphones, which can effectively cancel out all sounds but not selectively pick speakers based on their speech traits.

To make this possible, the team had to solve several problems, including optimizing the state-of-the-art speech separation network TFGridNet to make it run in real-time on embedded CPUs, finding a training methodology to use synthetic data to build a system capable of generalizing to real-world unseen speakers, and others.

Shyam Gollakota, one of the researchers behind "semantic hearing", highlights that their project differs from current approaches to AI in that it aims to modify people's auditory perception using on-device AI without relying on Cloud-based services.

At the moment, the system can enroll only one speaker at a time. Another limitation is that enrollment will succeed only if no other loud voices are coming from the same direction, but the user can run another enrollment on the speaker to improve the clarity if they are not satisfied with the initial result.

The team has open-sourced their code and dataset to facilitate future research work to improve target speech hearing.

About the Author

Sergio De Simone

Sergio De Simone is a software engineer. Sergio has been working as a software engineer for over twenty five years across a range of different projects and companies, including such different work environments as Siemens, HP, and small startups. For the last 10+ years, his focus has been on development for mobile platforms and related technologies. He is currently working for BigML, Inc., where he leads iOS and macOS development.

Show moreShow less

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

University of Washington AI-Powered Headphones Let Users Listen to a Single Person in a Crowd

Write & Win: InfoQ Contest

About the Author

Sergio De Simone

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter