Multimodal deep learning methods for speech and language rehabilitation: a cross-sectional observational study.

IF 2.2 4区医学 Q2 REHABILITATION

Disability and Rehabilitation-Assistive Technology Pub Date : 2025-09-05 DOI:10.1080/17483107.2025.2551708

Xinqiao Cen

{"title":"Multimodal deep learning methods for speech and language rehabilitation: a cross-sectional observational study.","authors":"Xinqiao Cen","doi":"10.1080/17483107.2025.2551708","DOIUrl":null,"url":null,"abstract":"<p><p>The speech and language rehabilitation are essential to people who have disorders of communication that may occur due to the condition of neurological disorder, developmental delays, or bodily disabilities. With the advent of deep learning, we introduce an improved multimodal rehabilitation pipeline that incorporates audio, video, and text information in order to provide patient-tailored therapy that adapts to the patient. The technique uses a cross-attention fusion multimodal hierarchical transformer architectural model that allows it to jointly design speech acoustics as well as the facial dynamics, lip articulation, and linguistic context. We adopt the strategy of self-supervised pretraining on large-scale unlabelled corpora and domain-adaptive fine-tuning with data augmentation in order to overcome the problem of cohort size and interpatient variability. A low latency inference architecture will provide real-time feedback and individualised changes to therapy. Clinical and synthetic test results show our method trained and verified on clinical and synthetic data fare better than uni-modal and conventional fusion baselines in terms of accuracy, patient engagement, and measurable therapeutic benefit. Such findings point out opportunities of using intelligent, multimodal deep learning systems to reinvent future of speech and language rehabilitation.</p>","PeriodicalId":47806,"journal":{"name":"Disability and Rehabilitation-Assistive Technology","volume":" ","pages":"1-13"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Disability and Rehabilitation-Assistive Technology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17483107.2025.2551708","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"REHABILITATION","Score":null,"Total":0}

引用次数: 0

Abstract

The speech and language rehabilitation are essential to people who have disorders of communication that may occur due to the condition of neurological disorder, developmental delays, or bodily disabilities. With the advent of deep learning, we introduce an improved multimodal rehabilitation pipeline that incorporates audio, video, and text information in order to provide patient-tailored therapy that adapts to the patient. The technique uses a cross-attention fusion multimodal hierarchical transformer architectural model that allows it to jointly design speech acoustics as well as the facial dynamics, lip articulation, and linguistic context. We adopt the strategy of self-supervised pretraining on large-scale unlabelled corpora and domain-adaptive fine-tuning with data augmentation in order to overcome the problem of cohort size and interpatient variability. A low latency inference architecture will provide real-time feedback and individualised changes to therapy. Clinical and synthetic test results show our method trained and verified on clinical and synthetic data fare better than uni-modal and conventional fusion baselines in terms of accuracy, patient engagement, and measurable therapeutic benefit. Such findings point out opportunities of using intelligent, multimodal deep learning systems to reinvent future of speech and language rehabilitation.

查看原文本刊更多论文

语音和语言康复的多模态深度学习方法：一项横断面观察研究。

言语和语言康复对于可能因神经障碍、发育迟缓或身体残疾而出现交流障碍的人至关重要。随着深度学习的出现，我们引入了一种改进的多模式康复管道，该管道结合了音频，视频和文本信息，以提供适合患者的患者定制治疗。该技术使用交叉注意融合多模态分层转换器架构模型，使其能够联合设计语音声学以及面部动态，嘴唇发音和语言上下文。为了克服队列大小和患者间可变性的问题，我们采用大规模未标记语料库的自监督预训练策略和数据增强的领域自适应微调。低延迟推理架构将提供实时反馈和个性化的治疗变化。临床和综合测试结果表明，我们的方法经过临床和综合数据的训练和验证，在准确性、患者参与度和可测量的治疗效益方面优于单模态和传统融合基线。这些发现指出了使用智能、多模态深度学习系统重塑语音和语言康复未来的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Disability and Rehabilitation-Assistive Technology REHABILITATION-

CiteScore

5.70

自引率

13.60%

发文量

128