A young woman using voice recognition software on a smartphone.
Luis Alvarez | DigitalVision | Getty Images
Speechmatics, which is based in Cambridge, England, said Tuesday that its system had an overall accuracy rate of 83% for African American voices.
That’s higher than Microsoft (73%), Amazon (69%), Google (69%), IBM (62%) and Apple (55%), according to research published by Stanford University in 2020 which compared results from major tech companies on how accurately their speech recognition programs understood African Americans.
Systems from Amazon, IBM, Google, Microsoft and Apple made nearly twice as many errors when interpreting words spoken by African Americans than white people, according to researchers at Stanford.
Speechmatics says its system misidentified words from Black voices 17% of the time, versus 31% for Google and Amazon.
“It’s critical to study and improve fairness in speech-to-text systems given the potential for disparate harm to individuals through downstream sectors ranging from healthcare to criminal justice,” said Allison Koenecke, lead author of the Stanford study.
Voice recognition tech has rapidly become embedded in everyday life, thanks to the prevalence of virtual assistants on smart devices like phones and speakers.
Apple pioneered the use of voice-activated software on mobile devices with its digital assistant Siri, while Amazon was one of the first to bring speech recognition to the home with its Echo speakers and Alexa assistant.
Researchers have become increasingly concerned about bias in the algorithms powering these speech recognition services. Specifically, experts say many voice recognition programs are trained on limited sets of data, making them less effective.
While speech recognition applications have little trouble transcribing, a white, male, East Coast news presenter, “they don’t have the same level of accuracy” with underrepresented voices, according to Will Williams, Speechmatics’ vice president of machine learning.
“As with all these things, it’s about the quality of data in the training sets,” Stephanie Hare, an AI ethics researcher, told CNBC. “There has been racial bias, gender bias, and regional accent bias in speech recognition technology for a long time.”
“This technology does not work the same for everyone, yet,” Hare added. “It could, eventually, with refinement.”
Speechmatics says it trained its artificial intelligence with unlabeled data from social media and podcasts to help it learn different aspects of speech including accent, language and intonation.
“We can soak it up almost in the same way that a child does,” Williams told CNBC.
The firm said its technology is trained on 1.1 million hours of audio.
Speechmatics called the development a “breakthrough” and said it hopes other tech companies become more transparent about efforts to reduce bias in AI.
“It would be good if people were open-sourcing test sets that let you evaluate how well you’re doing on this front,” Williams said. “Part of the problem has been that progress on certain demographics has been hidden.”
Tech giants have been ramping up their investments in speech recognition lately, with Microsoft agreeing to acquire software firm Nuance Communications for $16 billion in April.