辅助性和替代性语言交流（AASC）对构音障碍患者的帮助

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-01-22 DOI:10.1016/j.csl.2025.101777

Mariya Celin T.A. , Vijayalakshmi P. , Nagarajan T. , Mrinalini K.

{"title":"辅助性和替代性语言交流（AASC）对构音障碍患者的帮助","authors":"Mariya Celin T.A. , Vijayalakshmi P. , Nagarajan T. , Mrinalini K.","doi":"10.1016/j.csl.2025.101777","DOIUrl":null,"url":null,"abstract":"<div><div>Speech assistive aids are designed to enhance the intelligibility of speech, particularly for individuals with speech impairments such as dysarthria, by utilizing speech recognition and speech synthesis systems. The development of these devices promote independence and employability for dysarthric individuals facilitating their natural communication. However, the availability of speech assistive aids is limited due to various challenges, including the necessity to train a dysarthric speech recognition system tailored to the errors of dysarthric speakers, the portability required for use by any dysarthric individual with motor disorders, the need to sustain an adequate speech communication rate, and the financial implications associated with the development of such aids. To address this, in the current work a portable, affordable, and a personalized augmentative and alternative speech communication aid tailored to each dysarthric speaker’s need is developed. The dysarthric speech recognition system used in this aid is trained using a transfer learning approach, with normal speaker’s speech data as the source model and the target model includes the augmented dysarthric speech data. The data augmentation for dysarthric speech data is performed utilizing a virtual microphone and a multi-resolution-based feature extraction approach (VM-MRFE), previously proposed by the authors, to enhance the quantity of the target speech data and improve recognition accuracy. The recognized text is synthesized into intelligible speech using a hidden Markov model (HMM)-based text-to-speech synthesis system. To enhance accessibility, the recognizer and synthesizer systems are ported on to the raspberry pi platform, along with a collar microphone and loudspeaker. The real-time performance of the aid by the dysarthric user is examined, also, the aid provides speech communication, with recognition achieved in under 3 s and synthesis in 1.4 s, resulting in a speech delivery rate of roughly 4.4 s.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"92 ","pages":"Article 101777"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmentative and alternative speech communication (AASC) aid for people with dysarthria\",\"authors\":\"Mariya Celin T.A. , Vijayalakshmi P. , Nagarajan T. , Mrinalini K.\",\"doi\":\"10.1016/j.csl.2025.101777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Speech assistive aids are designed to enhance the intelligibility of speech, particularly for individuals with speech impairments such as dysarthria, by utilizing speech recognition and speech synthesis systems. The development of these devices promote independence and employability for dysarthric individuals facilitating their natural communication. However, the availability of speech assistive aids is limited due to various challenges, including the necessity to train a dysarthric speech recognition system tailored to the errors of dysarthric speakers, the portability required for use by any dysarthric individual with motor disorders, the need to sustain an adequate speech communication rate, and the financial implications associated with the development of such aids. To address this, in the current work a portable, affordable, and a personalized augmentative and alternative speech communication aid tailored to each dysarthric speaker’s need is developed. The dysarthric speech recognition system used in this aid is trained using a transfer learning approach, with normal speaker’s speech data as the source model and the target model includes the augmented dysarthric speech data. The data augmentation for dysarthric speech data is performed utilizing a virtual microphone and a multi-resolution-based feature extraction approach (VM-MRFE), previously proposed by the authors, to enhance the quantity of the target speech data and improve recognition accuracy. The recognized text is synthesized into intelligible speech using a hidden Markov model (HMM)-based text-to-speech synthesis system. To enhance accessibility, the recognizer and synthesizer systems are ported on to the raspberry pi platform, along with a collar microphone and loudspeaker. The real-time performance of the aid by the dysarthric user is examined, also, the aid provides speech communication, with recognition achieved in under 3 s and synthesis in 1.4 s, resulting in a speech delivery rate of roughly 4.4 s.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"92 \",\"pages\":\"Article 101777\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825000026\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000026","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

语音辅助设备的设计目的是通过使用语音识别和语音合成系统来提高语音的可理解性，特别是对于有语音障碍（如构音障碍）的个体。这些设备的发展促进了困难个体的独立性和就业能力，促进了他们的自然沟通。然而，由于各种各样的挑战，语言辅助设备的可用性受到限制，包括训练一种针对困难说话者错误的困难语音识别系统的必要性，任何有运动障碍的困难个体使用所需的便携性，维持足够的语言沟通率的需要，以及与开发此类辅助设备相关的财务影响。为了解决这个问题，在目前的工作中，开发了一种便携式的、负担得起的、个性化的辅助和替代语音交流辅助设备，以满足每个诵读困难者的需求。本辅助工具中使用的困难语音识别系统使用迁移学习方法进行训练，以正常说话人的语音数据作为源模型，目标模型包括增强的困难语音数据。利用虚拟麦克风和作者先前提出的基于多分辨率的特征提取方法（VM-MRFE）对困难语音数据进行数据增强，以增加目标语音数据的数量并提高识别精度。使用基于隐马尔可夫模型（HMM）的文本-语音合成系统将识别的文本合成为可理解的语音。为了提高可访问性，识别器和合成器系统被移植到树莓派平台上，以及一个项圈麦克风和扬声器。同时，该助听器提供语音交流，识别在3秒内完成，合成在1.4秒内完成，语音传递率约为4.4秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Augmentative and alternative speech communication (AASC) aid for people with dysarthria

Speech assistive aids are designed to enhance the intelligibility of speech, particularly for individuals with speech impairments such as dysarthria, by utilizing speech recognition and speech synthesis systems. The development of these devices promote independence and employability for dysarthric individuals facilitating their natural communication. However, the availability of speech assistive aids is limited due to various challenges, including the necessity to train a dysarthric speech recognition system tailored to the errors of dysarthric speakers, the portability required for use by any dysarthric individual with motor disorders, the need to sustain an adequate speech communication rate, and the financial implications associated with the development of such aids. To address this, in the current work a portable, affordable, and a personalized augmentative and alternative speech communication aid tailored to each dysarthric speaker’s need is developed. The dysarthric speech recognition system used in this aid is trained using a transfer learning approach, with normal speaker’s speech data as the source model and the target model includes the augmented dysarthric speech data. The data augmentation for dysarthric speech data is performed utilizing a virtual microphone and a multi-resolution-based feature extraction approach (VM-MRFE), previously proposed by the authors, to enhance the quantity of the target speech data and improve recognition accuracy. The recognized text is synthesized into intelligible speech using a hidden Markov model (HMM)-based text-to-speech synthesis system. To enhance accessibility, the recognizer and synthesizer systems are ported on to the raspberry pi platform, along with a collar microphone and loudspeaker. The real-time performance of the aid by the dysarthric user is examined, also, the aid provides speech communication, with recognition achieved in under 3 s and synthesis in 1.4 s, resulting in a speech delivery rate of roughly 4.4 s.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.