- MetaAI’s MMS project develops tech that recognizes and produces speech in over 1,100 languages.
- The project uses a huge dataset and a new learning method, covering languages with no prior speech technology.
- They trained their models using translated religious texts.
- The MMS models outperform existing ones in speech recognition, text-to-speech, and language identification.
- Despite some errors, MetaAI’s goal is to make information access easier in all languages and to aid various applications.
MetaAI is working on developing machines that can understand and speak in many different languages.
Normally, creating this kind of technology would need lots of audio data and text translations, but that’s not available for most languages.
There are more than 7,000 languages around the world, but current technologies only cater to about 100 of them.
What’s worse, many of these languages are at risk of being forgotten.
MetaAI‘s project, called the Massively Multilingual Speech (MMS), is trying to tackle this problem.
It uses a new kind of learning and a massive dataset which includes more than 1,100 languages, some of which are only spoken by a few hundred people.
The MMS models have proven to be better than existing ones and can work with 10 times more languages.
As part of this project, MetaAI gathered audio data of the New Testament, which has been translated into over 1,100 languages. They used these recordings to train their machines.
They also used other religious readings to further increase the number of languages.
Even though most of the recordings were made by male voices, their machines were still able to understand both male and female voices well.
Once the audio was collected, it was then processed to improve its quality and usability.
An alignment model was created to handle long recordings and remove any data that was incorrectly aligned.
This model was then shared with the research community.
But 32 hours of data per language wasn’t enough.
To solve this, MetaAI used a technique from a previous project, wav2vec 2.0, to reduce the amount of data needed.
They trained models on around 500,000 hours of speech data in over 1,400 languages.
These models were then fine-tuned to a specific task, like recognizing speech in different languages.
They tested their models on existing datasets and found that the MMS models performed very well, even better than some of the current best-performing models.
They also created a language identification model for over 4,000 languages and found it also performed well.
They even built a system that can turn text into speech for over 1,100 languages.
Despite the fact that the MMS data usually only included one speaker per language, the produced speech was of good quality.
However, the models are not perfect and might still make some mistakes in transcribing spoken language.
MetaAI aims to support thousands of languages with their technology, hoping to help keep many languages from being forgotten.
They want to increase the number of languages their system can handle and also want it to be able to understand different dialects.
Their goal is to make it easier for people to access information and use devices in their preferred language, potentially making the technology useful for things like virtual reality or messaging services.
In the future, they imagine having a single model that can handle all tasks for all languages, leading to better performance overall.
MetaAI’s Massively Multilingual Speech project is a breakthrough in technology that aims to tackle one of the main obstacles in understanding other cultures – language barriers.
With the ability to understand and speak in over 1,100 languages, including those spoken by only a few hundred people, the project is making strides towards minimising the negative impact of language barriers in communication.
With their goal to increase the number of languages their system can handle and to make it easier for people to access information in their preferred language, MetaAI is helping to preserve endangered languages and promote inclusivity in communication.
The breakthrough in technology offers optimism for a future where language is no longer a barrier to understanding others and their cultures.

