超声舌头成像中的深度学习：语音障碍自动检测的系统综述。

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2025-09-24 eCollection Date: 2025-01-01 DOI:10.3389/frai.2025.1631134

Saja Al Ani, Joanne Cleland, Ahmed Zoha

{"title":"超声舌头成像中的深度学习：语音障碍自动检测的系统综述。","authors":"Saja Al Ani, Joanne Cleland, Ahmed Zoha","doi":"10.3389/frai.2025.1631134","DOIUrl":null,"url":null,"abstract":"Background: Speech sound disorders (SSD) in children can significantly impact communication and development. Ultrasound tongue imaging (UTI) is a non-invasive method for visualising tongue motion during speech, offering a promising alternative for diagnosis and therapy. Deep learning (DL) techniques have shown great promise in automating the analysis of UTI data, although their clinical application for SSD remains underexplored.Objective: This review aims to synthesise how DL has been utilised in UTI to support automated SSD detection, highlighting the advancement of techniques, key challenges, and future directions.Methods: A comprehensive search of IEEE Xplore, PubMed, ScienceDirect, Scopus, Taylor & Francis, and arXiv identified studies from 2010 through 2025. Inclusion criteria focused on studies using DL to analyse UTI data with relevance to SSD classification, feature extraction, or speech assessment. Eleven studies met the criteria: three directly tackled disordered speech classification tasks, while four addressed supporting tasks like tongue contour segmentation and tongue motion modelling. Promising results were reported in each category, but limitations such as small datasets, inconsistent evaluation, and limited generalisability were common.Results: DL models demonstrate effectiveness in analysing UTI for articulatory assessment and show early potential in identifying SSD-related patterns. The included studies collectively outline a developmental pipeline, from foundational pre-processing to phoneme-level classification in typically developing speakers, and finally to preliminary attempts at classifying speech errors in children with SSD. This progression illustrates significant technological advances; however, it also emphasises gaps such as the lack of large, disorder-focused datasets and the need for integrated end-to-end systems.Conclusion: The field of DL-driven UTI assessment for speech disorders is developing. Current studies provide a strong technical foundation and proof-of-concept for automatic SSD detection using ultrasound, but clinical translation remains limited. Future research should prioritise the creation of larger annotated UTI datasets of disordered speech, developing generalisable and interpretable models, and validating fully integrated DL-UTI pipelines in real-world speech therapy settings. With these advances, DL-based UTI systems have the potential to transform SSD diagnosis and treatment by providing objective, real-time articulatory feedback in a child-friendly manner.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1631134"},"PeriodicalIF":4.7000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504283/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep learning in ultrasound tongue imaging: a systematic review toward automated detection of speech sound disorders.\",\"authors\":\"Saja Al Ani, Joanne Cleland, Ahmed Zoha\",\"doi\":\"10.3389/frai.2025.1631134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Speech sound disorders (SSD) in children can significantly impact communication and development. Ultrasound tongue imaging (UTI) is a non-invasive method for visualising tongue motion during speech, offering a promising alternative for diagnosis and therapy. Deep learning (DL) techniques have shown great promise in automating the analysis of UTI data, although their clinical application for SSD remains underexplored.Objective: This review aims to synthesise how DL has been utilised in UTI to support automated SSD detection, highlighting the advancement of techniques, key challenges, and future directions.Methods: A comprehensive search of IEEE Xplore, PubMed, ScienceDirect, Scopus, Taylor & Francis, and arXiv identified studies from 2010 through 2025. Inclusion criteria focused on studies using DL to analyse UTI data with relevance to SSD classification, feature extraction, or speech assessment. Eleven studies met the criteria: three directly tackled disordered speech classification tasks, while four addressed supporting tasks like tongue contour segmentation and tongue motion modelling. Promising results were reported in each category, but limitations such as small datasets, inconsistent evaluation, and limited generalisability were common.Results: DL models demonstrate effectiveness in analysing UTI for articulatory assessment and show early potential in identifying SSD-related patterns. The included studies collectively outline a developmental pipeline, from foundational pre-processing to phoneme-level classification in typically developing speakers, and finally to preliminary attempts at classifying speech errors in children with SSD. This progression illustrates significant technological advances; however, it also emphasises gaps such as the lack of large, disorder-focused datasets and the need for integrated end-to-end systems.Conclusion: The field of DL-driven UTI assessment for speech disorders is developing. Current studies provide a strong technical foundation and proof-of-concept for automatic SSD detection using ultrasound, but clinical translation remains limited. Future research should prioritise the creation of larger annotated UTI datasets of disordered speech, developing generalisable and interpretable models, and validating fully integrated DL-UTI pipelines in real-world speech therapy settings. With these advances, DL-based UTI systems have the potential to transform SSD diagnosis and treatment by providing objective, real-time articulatory feedback in a child-friendly manner.\",\"PeriodicalId\":33315,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":\"8 \",\"pages\":\"1631134\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504283/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2025.1631134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1631134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

背景：儿童语音障碍（SSD）对交流和发展有显著影响。超声舌头成像（UTI）是一种非侵入性的方法，可以在说话过程中观察舌头的运动，为诊断和治疗提供了一个有前途的选择。深度学习（DL）技术在UTI数据的自动化分析方面显示出巨大的前景，尽管其在SSD中的临床应用仍未得到充分探索。目的：本综述旨在综合如何利用DL在UTI中支持自动SSD检测，强调技术的进步，关键挑战和未来方向。方法：综合检索IEEE explore、PubMed、ScienceDirect、Scopus、Taylor & Francis和arXiv从2010年到2025年确定的研究。纳入标准侧重于使用DL分析与SSD分类、特征提取或语音评估相关的UTI数据的研究。11项研究符合标准：3项研究直接解决了无序语音分类任务，而4项研究解决了舌头轮廓分割和舌头运动建模等辅助任务。每个类别都报告了有希望的结果，但普遍存在数据集小、评估不一致和有限的通用性等局限性。结果：深度学习模型在分析UTI以进行发音评估方面表现出有效性，并在识别ssd相关模式方面显示出早期潜力。所包括的研究共同勾勒出一个发展管道，从基本的预处理到正常发育的说话者的音素水平分类，最后到对患有SSD的儿童的语音错误进行分类的初步尝试。这一进展说明了重大的技术进步；然而，它也强调了诸如缺乏大型、以无序为重点的数据集以及需要集成的端到端系统等差距。结论：dl驱动的语言障碍UTI评估领域正在发展。目前的研究为超声自动检测SSD提供了强大的技术基础和概念验证，但临床应用仍然有限。未来的研究应该优先考虑创建更大的无序语音注释UTI数据集，开发可推广和可解释的模型，并在现实世界的语言治疗环境中验证完全集成的DL-UTI管道。有了这些进步，基于dl的UTI系统有可能通过以儿童友好的方式提供客观、实时的关节反馈来改变SSD的诊断和治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep learning in ultrasound tongue imaging: a systematic review toward automated detection of speech sound disorders.

Background: Speech sound disorders (SSD) in children can significantly impact communication and development. Ultrasound tongue imaging (UTI) is a non-invasive method for visualising tongue motion during speech, offering a promising alternative for diagnosis and therapy. Deep learning (DL) techniques have shown great promise in automating the analysis of UTI data, although their clinical application for SSD remains underexplored.

Objective: This review aims to synthesise how DL has been utilised in UTI to support automated SSD detection, highlighting the advancement of techniques, key challenges, and future directions.

Methods: A comprehensive search of IEEE Xplore, PubMed, ScienceDirect, Scopus, Taylor & Francis, and arXiv identified studies from 2010 through 2025. Inclusion criteria focused on studies using DL to analyse UTI data with relevance to SSD classification, feature extraction, or speech assessment. Eleven studies met the criteria: three directly tackled disordered speech classification tasks, while four addressed supporting tasks like tongue contour segmentation and tongue motion modelling. Promising results were reported in each category, but limitations such as small datasets, inconsistent evaluation, and limited generalisability were common.

Results: DL models demonstrate effectiveness in analysing UTI for articulatory assessment and show early potential in identifying SSD-related patterns. The included studies collectively outline a developmental pipeline, from foundational pre-processing to phoneme-level classification in typically developing speakers, and finally to preliminary attempts at classifying speech errors in children with SSD. This progression illustrates significant technological advances; however, it also emphasises gaps such as the lack of large, disorder-focused datasets and the need for integrated end-to-end systems.

Conclusion: The field of DL-driven UTI assessment for speech disorders is developing. Current studies provide a strong technical foundation and proof-of-concept for automatic SSD detection using ultrasound, but clinical translation remains limited. Future research should prioritise the creation of larger annotated UTI datasets of disordered speech, developing generalisable and interpretable models, and validating fully integrated DL-UTI pipelines in real-world speech therapy settings. With these advances, DL-based UTI systems have the potential to transform SSD diagnosis and treatment by providing objective, real-time articulatory feedback in a child-friendly manner.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊