Use of Transfer Learning for the Automated Segmentation and Detection of Swallows via Digital Cervical Auscultation in Children.

IF 3 3区医学 Q1 OTORHINOLARYNGOLOGY

Dysphagia Pub Date : 2025-06-03 DOI:10.1007/s00455-025-10833-3

Stephen So, Timothy Tadj, Belinda Schwerin, Anne B Chang, Thuy T Frakking

{"title":"Use of Transfer Learning for the Automated Segmentation and Detection of Swallows via Digital Cervical Auscultation in Children.","authors":"Stephen So, Timothy Tadj, Belinda Schwerin, Anne B Chang, Thuy T Frakking","doi":"10.1007/s00455-025-10833-3","DOIUrl":null,"url":null,"abstract":"<p><p>Digital cervical auscultation (CA) has high diagnostic test accuracy in the detection of aspiration in children. However, the clinical application of digital CA is limited because swallow sound recordings require manual segmentation by trained experts, which is time consuming and not feasible in clinical practice. The automated detection of swallowing sounds in adults from sound recordings have reported accuracies between 76 and 95%. No equivalent literature exists for the automated detection of swallowing sounds in children. This study aimed to establish whether automated machine learning using a transfer learning approach can accurately detect and segment swallows from digital CA recordings in children. Swallow sounds were collected from 16 typically developing children, median age 18 months (range 4-35 months, 50% males); and 19 videofluoroscopic swallow studies of children with pediatric feeding disorders, median age 9 months (range 3-71 months, males 36.8% males). All swallowing sounds were on thin fluids. A deep convolutional neural network (DCNN) that was pre-trained for the task of audio event classification was used as the base machine learning model. Using the raw swallow audio data as input, embedding vectors from the base DCNN were computed and used to train a feedforward neural network to identify whether an audio segment was a swallow or not. A high overall accuracy of 91% was achieved using our model, with a sensitivity (or recall) and positive predictability (or precision) of 81% and 79%, respectively. Interestingly, the model was also able to detect saliva swallows in the clinical feeding evaluation test set, even though these non-nutritive swallows were not part of the training set. This indicates a level of generalizability of the model, where it was able to recognize swallowing events that it had not \"seen\" before. Our study provides the highest accuracy reported to date on the automatic segmentation and detection of swallowing sounds in children.</p>","PeriodicalId":11508,"journal":{"name":"Dysphagia","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dysphagia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00455-025-10833-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Digital cervical auscultation (CA) has high diagnostic test accuracy in the detection of aspiration in children. However, the clinical application of digital CA is limited because swallow sound recordings require manual segmentation by trained experts, which is time consuming and not feasible in clinical practice. The automated detection of swallowing sounds in adults from sound recordings have reported accuracies between 76 and 95%. No equivalent literature exists for the automated detection of swallowing sounds in children. This study aimed to establish whether automated machine learning using a transfer learning approach can accurately detect and segment swallows from digital CA recordings in children. Swallow sounds were collected from 16 typically developing children, median age 18 months (range 4-35 months, 50% males); and 19 videofluoroscopic swallow studies of children with pediatric feeding disorders, median age 9 months (range 3-71 months, males 36.8% males). All swallowing sounds were on thin fluids. A deep convolutional neural network (DCNN) that was pre-trained for the task of audio event classification was used as the base machine learning model. Using the raw swallow audio data as input, embedding vectors from the base DCNN were computed and used to train a feedforward neural network to identify whether an audio segment was a swallow or not. A high overall accuracy of 91% was achieved using our model, with a sensitivity (or recall) and positive predictability (or precision) of 81% and 79%, respectively. Interestingly, the model was also able to detect saliva swallows in the clinical feeding evaluation test set, even though these non-nutritive swallows were not part of the training set. This indicates a level of generalizability of the model, where it was able to recognize swallowing events that it had not "seen" before. Our study provides the highest accuracy reported to date on the automatic segmentation and detection of swallowing sounds in children.

查看原文本刊更多论文

迁移学习在儿童数字子宫颈听诊中燕子声自动分割和检测中的应用。

指颈听诊（CA）对儿童误吸的诊断具有较高的准确性。然而，数字CA的临床应用受到限制，因为吞咽录音需要经过训练的专家手动分割，这是耗时的，在临床实践中是不可行的。据报道，从录音中自动检测成人吞咽声音的准确率在76%到95%之间。关于儿童吞咽音的自动检测还没有相应的文献。本研究旨在确定使用迁移学习方法的自动机器学习是否可以准确地从儿童的数字CA记录中检测和分割燕子。收集了16名发育正常的儿童的吞咽声音，平均年龄18个月（范围4-35个月，50%为男性）；19例小儿喂养障碍儿童的透视吞咽研究，中位年龄9个月（范围3-71个月，男性36.8%）。所有的吞咽声都是在稀薄的液体上。采用深度卷积神经网络（deep convolutional neural network， DCNN）作为基础机器学习模型，对音频事件分类任务进行预训练。使用原始的燕子音频数据作为输入，计算来自基础DCNN的嵌入向量，并用于训练前馈神经网络来识别音频片段是否为燕子。使用我们的模型实现了91%的高总体准确率，灵敏度（或召回率）和正可预测性（或精度）分别为81%和79%。有趣的是，该模型还能够在临床喂养评估测试集中检测唾液燕子，尽管这些无营养的燕子不是训练集的一部分。这表明该模型具有一定程度的通用性，它能够识别以前没有“看到”的吞咽事件。我们的研究在儿童吞咽音的自动分割和检测方面提供了迄今为止最高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dysphagia 医学-耳鼻喉科学

CiteScore

4.90

自引率

15.40%

发文量

149

审稿时长

6-12 weeks

期刊介绍： Dysphagia aims to serve as a voice for the benefit of the patient. The journal is devoted exclusively to swallowing and its disorders. The purpose of the journal is to provide a source of information to the flourishing dysphagia community. Over the past years, the field of dysphagia has grown rapidly, and the community of dysphagia researchers have galvanized with ambition to represent dysphagia patients. In addition to covering a myriad of disciplines in medicine and speech pathology, the following topics are also covered, but are not limited to: bio-engineering, deglutition, esophageal motility, immunology, and neuro-gastroenterology. The journal aims to foster a growing need for further dysphagia investigation, to disseminate knowledge through research, and to stimulate communication among interested professionals. The journal publishes original papers, technical and instrumental notes, letters to the editor, and review articles.