两种语音合成技术在肌萎缩侧索硬化症患者的可懂度、自然度、偏好和对声音的态度方面的大规模比较。

IF 1.6 3区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Augmentative and Alternative Communication Pub Date : 2024-03-01 Epub Date: 2023-10-04 DOI:10.1080/07434618.2023.2262032

Jolene Hyppa-Martin, Jason Lilley, Mo Chen, Jaclyn Friese, Corinne Schmidt, H Timothy Bunnell

{"title":"两种语音合成技术在肌萎缩侧索硬化症患者的可懂度、自然度、偏好和对声音的态度方面的大规模比较。","authors":"Jolene Hyppa-Martin, Jason Lilley, Mo Chen, Jaclyn Friese, Corinne Schmidt, H Timothy Bunnell","doi":"10.1080/07434618.2023.2262032","DOIUrl":null,"url":null,"abstract":"Amyotrophic lateral sclerosis (ALS) commonly results in the inability to produce natural speech, making speech-generating devices (SGDs) important. Historically, synthetic voices generated by SGDs were neither unique, nor age- or dialect-appropriate, which depersonalized SGD use. Voices generated by SGDs can now be customized via voice banking and should ideally sound uniquely like the individual's natural speech, be intelligible, and elicit positive reactions from communication partners. This large-scale 2 x 2 mixed between- and within-participants design examined perceptions of 831 adult listeners regarding custom synthetic voices created for two individuals diagnosed with ALS via two synthesis systems in common clinical use (waveform concatenation and statistical parametric synthesis). The study explored relationships among synthesis system, dysarthria severity, synthetic speech intelligibility, naturalness, and preferences, and also provided a preliminary examination of attitudes regarding the custom synthetic voices. Synthetic voices generated via statistical parametric synthesis trained on deep neural networks were more intelligible, natural, and preferred than voices produced via waveform concatenation, and were associated with more positive attitudes. The custom synthetic voice created from moderately dysarthric speech was more intelligible than the voice created from mildly dysarthric speech. Clinical implications and factors that may have contributed to the relative intelligibilities are discussed.","PeriodicalId":49234,"journal":{"name":"Augmentative and Alternative Communication","volume":" ","pages":"31-45"},"PeriodicalIF":1.6000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A large-scale comparison of two voice synthesis techniques on intelligibility, naturalness, preferences, and attitudes toward voices banked by individuals with amyotrophic lateral sclerosis.\",\"authors\":\"Jolene Hyppa-Martin, Jason Lilley, Mo Chen, Jaclyn Friese, Corinne Schmidt, H Timothy Bunnell\",\"doi\":\"10.1080/07434618.2023.2262032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Amyotrophic lateral sclerosis (ALS) commonly results in the inability to produce natural speech, making speech-generating devices (SGDs) important. Historically, synthetic voices generated by SGDs were neither unique, nor age- or dialect-appropriate, which depersonalized SGD use. Voices generated by SGDs can now be customized via voice banking and should ideally sound uniquely like the individual's natural speech, be intelligible, and elicit positive reactions from communication partners. This large-scale 2 x 2 mixed between- and within-participants design examined perceptions of 831 adult listeners regarding custom synthetic voices created for two individuals diagnosed with ALS via two synthesis systems in common clinical use (waveform concatenation and statistical parametric synthesis). The study explored relationships among synthesis system, dysarthria severity, synthetic speech intelligibility, naturalness, and preferences, and also provided a preliminary examination of attitudes regarding the custom synthetic voices. Synthetic voices generated via statistical parametric synthesis trained on deep neural networks were more intelligible, natural, and preferred than voices produced via waveform concatenation, and were associated with more positive attitudes. The custom synthetic voice created from moderately dysarthric speech was more intelligible than the voice created from mildly dysarthric speech. Clinical implications and factors that may have contributed to the relative intelligibilities are discussed.\",\"PeriodicalId\":49234,\"journal\":{\"name\":\"Augmentative and Alternative Communication\",\"volume\":\" \",\"pages\":\"31-45\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Augmentative and Alternative Communication\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/07434618.2023.2262032\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Augmentative and Alternative Communication","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/07434618.2023.2262032","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

肌萎缩侧索硬化症（ALS）通常导致无法产生自然语音，这使得语音生成设备（SGD）变得重要。从历史上看，SGD产生的合成语音既不独特，也不适合年龄或方言，这使SGD的使用失去了个性。SGD产生的声音现在可以通过语音银行进行定制，理想情况下应该听起来像个人的自然语言，清晰易懂，并引起沟通伙伴的积极反应。这项大规模的2×2参与者之间和参与者内部混合设计通过两种常见临床使用的合成系统（波形拼接和统计参数合成），检查了831名成年听众对为两名被诊断为ALS的患者创建的定制合成声音的看法。该研究探讨了合成系统、构音障碍严重程度、合成语音清晰度、自然度和偏好之间的关系，并对人们对定制合成语音的态度进行了初步检验。通过在深度神经网络上训练的统计参数合成生成的合成语音比通过波形级联生成的语音更容易理解、更自然、更受欢迎，并且与更积极的态度有关。由中度构音障碍语音产生的自定义合成语音比由轻度构音障碍言语产生的语音更容易理解。讨论了临床意义和可能导致相对清晰度的因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A large-scale comparison of two voice synthesis techniques on intelligibility, naturalness, preferences, and attitudes toward voices banked by individuals with amyotrophic lateral sclerosis.

Amyotrophic lateral sclerosis (ALS) commonly results in the inability to produce natural speech, making speech-generating devices (SGDs) important. Historically, synthetic voices generated by SGDs were neither unique, nor age- or dialect-appropriate, which depersonalized SGD use. Voices generated by SGDs can now be customized via voice banking and should ideally sound uniquely like the individual's natural speech, be intelligible, and elicit positive reactions from communication partners. This large-scale 2 x 2 mixed between- and within-participants design examined perceptions of 831 adult listeners regarding custom synthetic voices created for two individuals diagnosed with ALS via two synthesis systems in common clinical use (waveform concatenation and statistical parametric synthesis). The study explored relationships among synthesis system, dysarthria severity, synthetic speech intelligibility, naturalness, and preferences, and also provided a preliminary examination of attitudes regarding the custom synthetic voices. Synthetic voices generated via statistical parametric synthesis trained on deep neural networks were more intelligible, natural, and preferred than voices produced via waveform concatenation, and were associated with more positive attitudes. The custom synthetic voice created from moderately dysarthric speech was more intelligible than the voice created from mildly dysarthric speech. Clinical implications and factors that may have contributed to the relative intelligibilities are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Augmentative and Alternative Communication AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY-

CiteScore

2.80

自引率

15.00%

发文量

审稿时长

>12 weeks

期刊介绍： As the official journal of the International Society for Augmentative and Alternative Communication (ISAAC), Augmentative and Alternative Communication (AAC) publishes scientific articles related to the field of augmentative and alternative communication (AAC) that report research concerning assessment, treatment, rehabilitation, and education of people who use or have the potential to use AAC systems; or that discuss theory, technology, and systems development relevant to AAC. The broad range of topic included in the Journal reflects the development of this field internationally. Manuscripts submitted to AAC should fall within one of the following categories, AND MUST COMPLY with associated page maximums listed on page 3 of the Manuscript Preparation Guide. Research articles (full peer review), These manuscripts report the results of original empirical research, including studies using qualitative and quantitative methodologies, with both group and single-case experimental research designs (e.g, Binger et al., 2008; Petroi et al., 2014). Technical, research, and intervention notes (full peer review): These are brief manuscripts that address methodological, statistical, technical, or clinical issues or innovations that are of relevance to the AAC community and are designed to bring the research community’s attention to areas that have been minimally or poorly researched in the past (e.g., research note: Thunberg et al., 2016; intervention notes: Laubscher et al., 2019).