The Initial Screening of Laryngeal Tumors via Voice Acoustic Analysis Based on Siamese Network Under Small Samples.

IF 2.5 4区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY
Zhenzhen You, Delong Sun, Zhenghao Shi, Shuangli Du, Xinhong Hei, Demin Kong, Xiaoying Du, Jing Yan, Xiaoyong Ren, Jin Hou
{"title":"The Initial Screening of Laryngeal Tumors via Voice Acoustic Analysis Based on Siamese Network Under Small Samples.","authors":"Zhenzhen You, Delong Sun, Zhenghao Shi, Shuangli Du, Xinhong Hei, Demin Kong, Xiaoying Du, Jing Yan, Xiaoyong Ren, Jin Hou","doi":"10.1016/j.jvoice.2025.03.043","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The initial screening of laryngeal tumors via voice acoustic analysis is based on the clinician's experience that is subjective. This article introduces a Siamese network with an auxiliary gender classifier for automated, accurate, and objective initial screening of laryngeal tumors based on voice signals.</p><p><strong>Methods: </strong>The study involved 71 tumor patients and 293 non-tumor subjects of Chinese Mandarin. This dataset was divided into a training set and a test set in a ratio of 4:1. We applied nine data augmentation techniques to enlarge the voice training set and extracted the corresponding mel-frequency cepstral coefficients (MFCC) maps. The MFCC maps were randomly paired and fed into the proposed Siamese network to achieve multitask classification for tumor and non-tumor, woman and man. The performance of the proposed model was compared with one machine learning method and six classical deep learning models with and without the auxiliary gender classifier.</p><p><strong>Results: </strong>Experiments demonstrate the superiority of the proposed network compared with the reference models. The proposed model achieved an overall accuracy of 0.9437, an F score of 0.8462, a precision of 0.9167, a sensitivity of 0.7857, and a specificity of 0.9825.</p><p><strong>Conclusion: </strong>The proposed network can assist in the initial screening of laryngeal tumors through voice acoustic analysis. The initial screening solely through voice acoustic analysis can help individuals seek medical assistance outside the hospitals and reduce the burden on doctors as well.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2025.03.043","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: The initial screening of laryngeal tumors via voice acoustic analysis is based on the clinician's experience that is subjective. This article introduces a Siamese network with an auxiliary gender classifier for automated, accurate, and objective initial screening of laryngeal tumors based on voice signals.

Methods: The study involved 71 tumor patients and 293 non-tumor subjects of Chinese Mandarin. This dataset was divided into a training set and a test set in a ratio of 4:1. We applied nine data augmentation techniques to enlarge the voice training set and extracted the corresponding mel-frequency cepstral coefficients (MFCC) maps. The MFCC maps were randomly paired and fed into the proposed Siamese network to achieve multitask classification for tumor and non-tumor, woman and man. The performance of the proposed model was compared with one machine learning method and six classical deep learning models with and without the auxiliary gender classifier.

Results: Experiments demonstrate the superiority of the proposed network compared with the reference models. The proposed model achieved an overall accuracy of 0.9437, an F score of 0.8462, a precision of 0.9167, a sensitivity of 0.7857, and a specificity of 0.9825.

Conclusion: The proposed network can assist in the initial screening of laryngeal tumors through voice acoustic analysis. The initial screening solely through voice acoustic analysis can help individuals seek medical assistance outside the hospitals and reduce the burden on doctors as well.

基于Siamese网络的小样本喉部肿瘤声分析初步筛选。
目的:通过嗓音分析对喉部肿瘤进行初步筛查是基于临床医生的主观经验。本文介绍了一个带有辅助性别分类器的Siamese网络,用于基于语音信号的喉肿瘤的自动,准确和客观的初始筛选。方法:对71例肿瘤患者和293例非肿瘤汉语普通话受试者进行研究。该数据集以4:1的比例分为训练集和测试集。我们采用了9种数据增强技术对语音训练集进行放大,并提取相应的mel-frequency倒谱系数(MFCC)图。MFCC图谱被随机配对并输入到所提出的Siamese网络中,以实现肿瘤和非肿瘤、女性和男性的多任务分类。将该模型的性能与一种机器学习方法和六种经典深度学习模型进行了比较,并对有无辅助性别分类器进行了比较。结果:与参考模型相比,实验证明了该网络的优越性。该模型总体精度为0.9437,F值为0.8462,精度为0.9167,灵敏度为0.7857,特异性为0.9825。结论:该网络可通过嗓音分析辅助喉部肿瘤的初步筛查。仅通过声音分析进行初步筛查可以帮助个人在医院外寻求医疗援助,也可以减轻医生的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Voice
Journal of Voice 医学-耳鼻喉科学
CiteScore
4.00
自引率
13.60%
发文量
395
审稿时长
59 days
期刊介绍: The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信