A new study from researchers affiliated with the University College London, Nokia Bell Labs Cambridge, and the University of Oxford shows how differences in microphone quality can impact speech recognition accuracy. The coauthors use a custom data set — Libri-Adapt — containing 7,200 hours of English speech to test whether Mozilla’s DeepSpeech model handles unique environments and microphones well. The findings suggest there is a noticeable degradation in accuracy during certain “domain shifts,” with word error rate increasing to as high as 28% after switching microphones.
Automatic speech recognition models must perform well across hardware if they’re to be reliable. For instance, customers expect the models powering Alexa to work similarly on different smart speakers, smart displays, and smart devices. But not all models achieve this ideal because they’re not consistently trained with diverse corpora. That is to say, some corpora don’t contain speech recorded with microphones of varying quality and in novel settings.
Libri-Adapt is designed to expose these flaws with speech recorded using the microphones in six different products: A PlayStation Eye camera, a generic USB mic, a Google Nexus 6 smartphone, the Shure MV5, a Raspberry Pi accessory called ReSpeaker, and the Matrix Voice developer kit. The corpus has speech data in three English accents, namely U.S. English, British English, and Indian English, which came from 251 U.S. speakers and synthetic voices generated by Google Cloud Platform’s text-to-speech API. Beyond this, Libra-Adapt contains wind, rain, and laughter background noises intended to serve as added confounders.

Above: Word error rate of a fine-tuned DeepSpeech model trained and tested on various microphone pairs for U.S. English speech. The columns correspond to the training microphone domain and rows correspond to the test microphone domain.
During experiments, the researchers compared the speech recognition performance of a pre-trained DeepSpeech model (version 0.5.0) across the aforementioned six devices. They found that when data from the same microphone was used for training and testing the model, DeepSpeech unsurprisingly achieved the smallest error rate (e.g., 11.39% in the case of PlayStation Eye). But the inverse was also true: When there was a mismatch between the training and testing sets, the word error rate jumped substantially (e.g., 24.18% when a model trained on PlayStation Eye-recorded speech was tested on Matrix Voice speech).
The researchers say that Libra-Adapt, which is available in open source, can be used to create scenarios that test the generalizability of speech recognition algorithms. As an example, they tested a DeepSpeech model trained on U.S.-accented speech collected by a ReSpeaker microphone against Indian-accented speech with rain background noise recorded by a PlayStation Eye. The results show the model suffered an error rate uptick of nearly 29.8%, pointing to poor robustness on the model’s part.
Although the coauthors claim to have manually verified hundreds of Libra-Adapt’s recordings, they caution that some might be incomplete or noisy. That’s the reason why they plan to develop unsupervised domain adaptation algorithms in future work to tackle domain shifts in the data set.
View original article here Source
Fire 7 Kids Edition Tablet, 7" Display, 16 GB, Blue Kid-Proof Case
$99.99 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Roku Streaming Stick+ | HD/4K/HDR Streaming Device with Long-range Wireless and Voice Remote with TV Controls
$47.47 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)All-new Echo Dot (4th Gen, 2020 release) | Smart speaker with Alexa | Charcoal
$49.99 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)amFilm Tempered Glass Screen Protector for Nintendo Switch 2017 (2-Pack)
$7.99 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)TCL 32" 3-Series 720p Roku Smart TV - 32S335
$128.00 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)VELCRO Brand ONE-WRAP Cable Ties | 100Pk | 8 x 1/2" Black Cord Organization Straps | Thin Pre-Cut Design | Wire Management for Organizing Home, Office and Data Centers
$11.58 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)All-new Echo (4th Gen) | With premium sound, smart home hub, and Alexa | Charcoal
$99.99 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Roku Ultra 2020 | Streaming Media Player HD/4K/HDR/Dolby Vision with Dolby Atmos, Bluetooth Streaming, and Roku Voice Remote with Headphone Jack and Personal Shortcuts, includes Premium HDMI Cable
$94.79 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Apple TV 4K (32GB, Latest Model)
$179.00 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Hisense 43-Inch 43H5500G Full HD Smart Android TV with Voice Remote (2020 Model)
$199.99 (as of January 25, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Amazon Auto Links: Could not resolve the given unit type, . Please be sure to update the auto-insert definition if you have deleted the unit.