F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo
{"title":"基于语音工具的语音可理解度检测与评价","authors":"F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo","doi":"10.1109/CLEI56649.2022.9959936","DOIUrl":null,"url":null,"abstract":"The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.","PeriodicalId":156073,"journal":{"name":"2022 XVLIII Latin American Computer Conference (CLEI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection and Evaluation of Speech Intelligibility with Speech Tool\",\"authors\":\"F. Meloni, Bianca Sicchieri, P. Mandrá, Renato De Freitas Bulcão-Neto, Alessandra Alaniz Macedo\",\"doi\":\"10.1109/CLEI56649.2022.9959936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.\",\"PeriodicalId\":156073,\"journal\":{\"name\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI56649.2022.9959936\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 XVLIII Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI56649.2022.9959936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Detection and Evaluation of Speech Intelligibility with Speech Tool
The growth of assistive technologies brings new perspectives to Speech Sound Disorders (SSD) treatment. For example, automatic Speech Recognition (ASR) tools recognize and convert sound signals into text in multiple languages. Commonly, these tools rely on models trained with samples from typically developed speakers, but most of them can deal well with sonorous variations such as accents. Hence, there is an expectation that they may also transcribe phonological disorders, such as those produced by people with SSD. However, this potential remains poorly known. Here, we analyze the potential of one of the ASR tools, Google’s speech-to-text API©, as a multilevel indicator of speech intelligibility. We used pronunciations of volunteer actors, which simulated people with a broad spectrum of speech impairments. The tool indicated speech intelligibility at a general level and was marginally capable of determining the SSD type, but it could not map the syllable exchanges accurately. In short, our results suggest that ASRs have great potential as components of assistive tools in many contexts. Our contribution goes beyond the tests themselves, as we propose a simple, robust, systematic, and automated method to quantify speech intelligibility using ASRs. The method, which still needs clinical validation, can be replicated in other versions and tools and the pronunciations of people who are genuinely SSD carriers or in other languages, as long as they use the appropriate protocols. The goal is to enhance ASR tools’ capabilities to promote even higher digital inclusion for people with phonological disorders.