As part of a Facebook Reality Labs (FRL) brain-computer interface (BCI) research program called Project Steno, a team of scientists from the University of California, San Francisco (UCSF) are working on converting brain waves into a text transcription of speech. The goal of Facebook's project is a device that allows users to "type" by imagining themselves speaking.
In a paper published in Nature Communications, authors David A. Moses, Matthew K. Leonard, Joseph G. Makin, and principal investigator Edward F. Chang present their system which decodes electrical signals from patients' brains into a text representation of the speech sounds that the patients hear and speak. While this is not the first attempt at converting neural signals to text or audio, the authors claim "there have not been attempts to decode both perceived and produced speech from human participants in a real-time setting that resembles natural communication." This real-time aspect would be key for using the technology to build a prosthetic device for helping speech-impaired users to engage in conversation.
There have been several research teams working on the problem of using electrocorticography (ECoG) to record electrical impulses from the brain and convert it to synthesized speech. Chang's team previously recorded the signals when a patient pantomimed speech and used deep-learning to convert those signals to audio. However, unlike this latest work, the data from the brain was not processed in real-time. Instead of using deep-learning, Chang's latest work uses a more traditional technique from automated speech recognition called Viterbi decoding with hidden Markov models. The team chose not to use deep-learning "primarily due to the relatively small amount of data that can be collected to train models," and citing the Viterbi model's "inherent robustness."
The experimental setup involved patients who had ECoG devices implanted in their brains as part of preparation for neurosurgery. The patients listened to a set of pre-recorded questions and verbally responded; the patients chose their response from a menu of possible answers. The ECoG signals from their brains were measured in real-time and decoded into "phones," or representations of sounds, both during the question and answer phase. The sequence of phones is then converted into a textual representation of the original speech. To improve the accuracy decoding the answer, the team used a conversation model that restricts the possible outputs, given that the question has been decoded. For example, if the question is decoded as: "Which musical instrument do you like listening to?" then the system can restrict the decoded answer to one of a small set of possible answers.
The goal of Chang's research lab is to produce a system that can give artificial speech capability to patients who cannot speak, due to paralysis or other malady, but who still preserve the portions of their brain that control the vocal tract. Facebook hopes to also use the technology to build a computer interface that allows users to simply think the words they want to input, instead of typing. Regardless of the application, there is at least one major challenge: currently all these research techniques require data that is gathered by sensors that are implanted inside the user's brain.
Facebook is working with other research labs to tackle the problem of non-invasive neurological sensors. According to a recent blog post, one prototype method uses infrared light to detect changes in oxygen levels in the brain. This has the potential to measure brain activity, but currently the device is "bulky, slow, and unreliable." The labs are also exploring optical techniques to measure movements in the brain's blood vessels and neurons, but Facebook estimates it could take a decade before the technology is reliable.
A commenter on Hacker News compared the Facebook project to Neuralink, a startup co-founded by Elon Musk:
Neuralink sounds more ambitious and more interesting to me. I'm very skeptical of the non-invasive brain scanning Facebook is talking about. It seems unlikely to ever work well enough to be useful to anyone except people who are completely paralyzed. But they will be first in line for the invasive brain interface techniques that will work far better, so they won't need the non-invasive stuff either.