基于语音工具的语音可理解度检测与评价

2022 XVLIII Latin American Computer Conference (CLEI) Pub Date : 2022-10-17 DOI:10.1109/CLEI56649.2022.9959936

F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo

{"title":"基于语音工具的语音可理解度检测与评价","authors":"F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo","doi":"10.1109/CLEI56649.2022.9959936","DOIUrl":null,"url":null,"abstract":"The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.","PeriodicalId":156073,"journal":{"name":"2022 XVLIII Latin American Computer Conference (CLEI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection and Evaluation of Speech Intelligibility with Speech Tool\",\"authors\":\"F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo\",\"doi\":\"10.1109/CLEI56649.2022.9959936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.\",\"PeriodicalId\":156073,\"journal\":{\"name\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI56649.2022.9959936\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 XVLIII Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI56649.2022.9959936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

辅助技术的发展为语音障碍的治疗带来了新的前景。例如，自动语音识别(ASR)工具可以识别声音信号并将其转换为多种语言的文本。通常，这些工具依赖于使用典型发达扬声器样本训练的模型，但大多数模型都可以很好地处理诸如口音之类的声音变化。因此，人们期望它们也可以转录语音障碍，例如SSD患者产生的语音障碍。然而，这种潜力仍然鲜为人知。在这里，我们分析了其中一个ASR工具的潜力，谷歌的语音到文本API©，作为语音可理解性的多层次指标。我们使用了志愿者演员的发音，他们模拟了有各种语言障碍的人。该工具在一般水平上表示语音可理解性，并且能够勉强确定SSD类型，但它不能准确地映射音节交换。简而言之，我们的研究结果表明，在许多情况下，asr作为辅助工具的组成部分具有很大的潜力。我们的贡献超越了测试本身，因为我们提出了一种简单，健壮，系统和自动化的方法来量化使用asr的语音可理解性。这种方法仍然需要临床验证，但只要使用适当的协议，就可以在其他版本和工具中复制，也可以在真正的SSD携带者的发音中复制，或者在其他语言中复制。目标是增强ASR工具的能力，以促进语音障碍患者更高程度的数字包容。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection and Evaluation of Speech Intelligibility with Speech Tool

The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 XVLIII Latin American Computer Conference (CLEI)

自引率

0.00%

发文量