Multimodal deep learning methods for speech and language rehabilitation: a cross-sectional observational study.

IF 2.2 4区 医学 Q2 REHABILITATION
Xinqiao Cen
{"title":"Multimodal deep learning methods for speech and language rehabilitation: a cross-sectional observational study.","authors":"Xinqiao Cen","doi":"10.1080/17483107.2025.2551708","DOIUrl":null,"url":null,"abstract":"<p><p>The speech and language rehabilitation are essential to people who have disorders of communication that may occur due to the condition of neurological disorder, developmental delays, or bodily disabilities. With the advent of deep learning, we introduce an improved multimodal rehabilitation pipeline that incorporates audio, video, and text information in order to provide patient-tailored therapy that adapts to the patient. The technique uses a cross-attention fusion multimodal hierarchical transformer architectural model that allows it to jointly design speech acoustics as well as the facial dynamics, lip articulation, and linguistic context. We adopt the strategy of self-supervised pretraining on large-scale unlabelled corpora and domain-adaptive fine-tuning with data augmentation in order to overcome the problem of cohort size and interpatient variability. A low latency inference architecture will provide real-time feedback and individualised changes to therapy. Clinical and synthetic test results show our method trained and verified on clinical and synthetic data fare better than uni-modal and conventional fusion baselines in terms of accuracy, patient engagement, and measurable therapeutic benefit. Such findings point out opportunities of using intelligent, multimodal deep learning systems to reinvent future of speech and language rehabilitation.</p>","PeriodicalId":47806,"journal":{"name":"Disability and Rehabilitation-Assistive Technology","volume":" ","pages":"1-13"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Disability and Rehabilitation-Assistive Technology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17483107.2025.2551708","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"REHABILITATION","Score":null,"Total":0}
引用次数: 0

Abstract

The speech and language rehabilitation are essential to people who have disorders of communication that may occur due to the condition of neurological disorder, developmental delays, or bodily disabilities. With the advent of deep learning, we introduce an improved multimodal rehabilitation pipeline that incorporates audio, video, and text information in order to provide patient-tailored therapy that adapts to the patient. The technique uses a cross-attention fusion multimodal hierarchical transformer architectural model that allows it to jointly design speech acoustics as well as the facial dynamics, lip articulation, and linguistic context. We adopt the strategy of self-supervised pretraining on large-scale unlabelled corpora and domain-adaptive fine-tuning with data augmentation in order to overcome the problem of cohort size and interpatient variability. A low latency inference architecture will provide real-time feedback and individualised changes to therapy. Clinical and synthetic test results show our method trained and verified on clinical and synthetic data fare better than uni-modal and conventional fusion baselines in terms of accuracy, patient engagement, and measurable therapeutic benefit. Such findings point out opportunities of using intelligent, multimodal deep learning systems to reinvent future of speech and language rehabilitation.

语音和语言康复的多模态深度学习方法:一项横断面观察研究。
言语和语言康复对于可能因神经障碍、发育迟缓或身体残疾而出现交流障碍的人至关重要。随着深度学习的出现,我们引入了一种改进的多模式康复管道,该管道结合了音频,视频和文本信息,以提供适合患者的患者定制治疗。该技术使用交叉注意融合多模态分层转换器架构模型,使其能够联合设计语音声学以及面部动态,嘴唇发音和语言上下文。为了克服队列大小和患者间可变性的问题,我们采用大规模未标记语料库的自监督预训练策略和数据增强的领域自适应微调。低延迟推理架构将提供实时反馈和个性化的治疗变化。临床和综合测试结果表明,我们的方法经过临床和综合数据的训练和验证,在准确性、患者参与度和可测量的治疗效益方面优于单模态和传统融合基线。这些发现指出了使用智能、多模态深度学习系统重塑语音和语言康复未来的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.70
自引率
13.60%
发文量
128
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信