The Mozilla Foundation launched “Common Voice,” which is a crowdsourced initiative to build an open source data set for voice recognition applications.
Many technology companies believe that voice control will be embedded into most devices in the future. This is why Apple, Google, Amazon, Microsoft, Baidu, and others are all trying to put their own voice-controlled artificial intelligence assistants into as many devices as they can and as fast as they can, in order to gain market share before the competition.
The problem with this, according to Mozilla, is that voice controlled technologies could end up being dominated by proprietary technology and data sets, which aren’t made available to startups and academics. As some large companies already benefit from billion-dollar revenues, it could later become too difficult for startups to catch up with the big players. Though Common Voice, Mozilla aims to democratize voice recognition technology.
Another issue is that the voice recognition systems developed by the big technology companies are also focused mainly on only a handful of the most popular languages, such as English and Chinese. However, the market for devices that need voice control is much larger than the populations that can speak those languages. Mozilla hopes to improve this with its open source project.
Crowdsourced Voice Engine
Mozilla wants to collect over 10,000 hours of recordings from people reading sentences out loud that can later be verified for accuracy by other volunteers. The organization believes that this number of recordings should make the engine accurate enough for use in production.
The quality of the recording doesn’t matter, and in fact Mozilla suggested reading the text in different environments. The idea is to enable third-party developers to use the open source engine in all sorts of products, so the technology should become advanced enough to work in various real-world environments, too. It can’t work only in a bedroom without any noise in the background.
This crowdsourcing strategy is one that Google Translate has used from the beginning, and although it’s recently focused more on using machine learning to do the translations, people can still “correct” translated words or sentences.
Mozilla said that it will release the open source Common Voice database later this year.