Ye Yang, Dathan Nguyen, Katherine Chen, Fan-Gang Zeng
{"title":"Evaluating synthesized speech intelligibility in noise.","authors":"Ye Yang, Dathan Nguyen, Katherine Chen, Fan-Gang Zeng","doi":"10.1121/10.0036397","DOIUrl":null,"url":null,"abstract":"<p><p>Humans can modify their speech to improve intelligibility in noisy environments. With the advancement of speech synthesis technology, machines may also synthesize voices that remain highly intelligible in noise condition. This study evaluates both the subjective and objective intelligibility of synthesized speech in speech-shaped noise from three major speech synthesis platforms. It was found that synthesized voices have a similar intelligibility range to human voices, and some synthesized voices were more intelligible than human voices. It was also found that two modern automatic speech recognition systems recognized 10% more words than human listeners.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 4","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0036397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Humans can modify their speech to improve intelligibility in noisy environments. With the advancement of speech synthesis technology, machines may also synthesize voices that remain highly intelligible in noise condition. This study evaluates both the subjective and objective intelligibility of synthesized speech in speech-shaped noise from three major speech synthesis platforms. It was found that synthesized voices have a similar intelligibility range to human voices, and some synthesized voices were more intelligible than human voices. It was also found that two modern automatic speech recognition systems recognized 10% more words than human listeners.