Stephen So, Timothy Tadj, Belinda Schwerin, Anne B Chang, Thuy T Frakking
{"title":"Use of Transfer Learning for the Automated Segmentation and Detection of Swallows via Digital Cervical Auscultation in Children.","authors":"Stephen So, Timothy Tadj, Belinda Schwerin, Anne B Chang, Thuy T Frakking","doi":"10.1007/s00455-025-10833-3","DOIUrl":null,"url":null,"abstract":"<p><p>Digital cervical auscultation (CA) has high diagnostic test accuracy in the detection of aspiration in children. However, the clinical application of digital CA is limited because swallow sound recordings require manual segmentation by trained experts, which is time consuming and not feasible in clinical practice. The automated detection of swallowing sounds in adults from sound recordings have reported accuracies between 76 and 95%. No equivalent literature exists for the automated detection of swallowing sounds in children. This study aimed to establish whether automated machine learning using a transfer learning approach can accurately detect and segment swallows from digital CA recordings in children. Swallow sounds were collected from 16 typically developing children, median age 18 months (range 4-35 months, 50% males); and 19 videofluoroscopic swallow studies of children with pediatric feeding disorders, median age 9 months (range 3-71 months, males 36.8% males). All swallowing sounds were on thin fluids. A deep convolutional neural network (DCNN) that was pre-trained for the task of audio event classification was used as the base machine learning model. Using the raw swallow audio data as input, embedding vectors from the base DCNN were computed and used to train a feedforward neural network to identify whether an audio segment was a swallow or not. A high overall accuracy of 91% was achieved using our model, with a sensitivity (or recall) and positive predictability (or precision) of 81% and 79%, respectively. Interestingly, the model was also able to detect saliva swallows in the clinical feeding evaluation test set, even though these non-nutritive swallows were not part of the training set. This indicates a level of generalizability of the model, where it was able to recognize swallowing events that it had not \"seen\" before. Our study provides the highest accuracy reported to date on the automatic segmentation and detection of swallowing sounds in children.</p>","PeriodicalId":11508,"journal":{"name":"Dysphagia","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dysphagia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00455-025-10833-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Digital cervical auscultation (CA) has high diagnostic test accuracy in the detection of aspiration in children. However, the clinical application of digital CA is limited because swallow sound recordings require manual segmentation by trained experts, which is time consuming and not feasible in clinical practice. The automated detection of swallowing sounds in adults from sound recordings have reported accuracies between 76 and 95%. No equivalent literature exists for the automated detection of swallowing sounds in children. This study aimed to establish whether automated machine learning using a transfer learning approach can accurately detect and segment swallows from digital CA recordings in children. Swallow sounds were collected from 16 typically developing children, median age 18 months (range 4-35 months, 50% males); and 19 videofluoroscopic swallow studies of children with pediatric feeding disorders, median age 9 months (range 3-71 months, males 36.8% males). All swallowing sounds were on thin fluids. A deep convolutional neural network (DCNN) that was pre-trained for the task of audio event classification was used as the base machine learning model. Using the raw swallow audio data as input, embedding vectors from the base DCNN were computed and used to train a feedforward neural network to identify whether an audio segment was a swallow or not. A high overall accuracy of 91% was achieved using our model, with a sensitivity (or recall) and positive predictability (or precision) of 81% and 79%, respectively. Interestingly, the model was also able to detect saliva swallows in the clinical feeding evaluation test set, even though these non-nutritive swallows were not part of the training set. This indicates a level of generalizability of the model, where it was able to recognize swallowing events that it had not "seen" before. Our study provides the highest accuracy reported to date on the automatic segmentation and detection of swallowing sounds in children.
期刊介绍:
Dysphagia aims to serve as a voice for the benefit of the patient. The journal is devoted exclusively to swallowing and its disorders. The purpose of the journal is to provide a source of information to the flourishing dysphagia community. Over the past years, the field of dysphagia has grown rapidly, and the community of dysphagia researchers have galvanized with ambition to represent dysphagia patients. In addition to covering a myriad of disciplines in medicine and speech pathology, the following topics are also covered, but are not limited to: bio-engineering, deglutition, esophageal motility, immunology, and neuro-gastroenterology. The journal aims to foster a growing need for further dysphagia investigation, to disseminate knowledge through research, and to stimulate communication among interested professionals. The journal publishes original papers, technical and instrumental notes, letters to the editor, and review articles.