Acoustic cues for person identification using cough sounds

Computer methods and programs in biomedicine update Pub Date : 2025-01-01 DOI:10.1016/j.cmpbup.2025.100195

Van-Thuan Tran, Ting-Hao You, Wei-Ho Tsai

{"title":"Acoustic cues for person identification using cough sounds","authors":"Van-Thuan Tran, Ting-Hao You, Wei-Ho Tsai","doi":"10.1016/j.cmpbup.2025.100195","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.</div></div><div><h3>Methods</h3><div>We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.</div></div><div><h3>Results</h3><div>Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.</div></div><div><h3>Conclusion</h3><div>This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100195"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990025000199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.

Methods

We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.

Results

Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.

Conclusion

This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.

查看原文本刊更多论文

用咳嗽声作为识别人的声学线索

目的：本研究提出了一种改进的非语言发声方法来识别人（PID），特别关注咳嗽声音作为一种生物识别模式。虽然最近的研究已经证明了基于咳嗽的PID （CPID）的可行性，但大多数报告的准确率约为80 - 90%，并且在模型效率、泛化或鲁棒性方面可能面临限制。我们的目标是通过紧凑的模型设计和更有效的培训策略来提高CPID的表现。方法我们收集了来自19名受试者的自定义数据集，并开发了一个轻量级但有效的CPID深度学习框架。提出的架构CoughCueNet是一个卷积循环神经网络，旨在捕捉咳嗽声音的空间和时间模式。训练过程结合了一个混合损失函数，该函数结合了监督对比（SC）学习和交叉熵（CE）损失来增强特征识别。我们系统地评估了多种声学表征，包括mfc和频谱图，以确定最合适的特征。我们还应用数据增强来增强鲁棒性，并通过测试咳嗽数据上的语音训练模型来研究跨模态可移植性。结果CPID系统的平均识别准确率为97.18%。使用混合SC+CE损失函数训练所提出的CoughCueNet，不断提高模型的泛化和鲁棒性。它优于单独使用CE损失训练的相同网络和更大容量网络（即VGG16和ResNet50），其准确率达到90%左右。在测试的特征中，MFCCs产生了优于谱图的识别性能。用经过语言训练的模型对咳嗽进行的实验显示，跨声音可移植性有限，这强调了对特定咳嗽模型的需求。本研究通过证明使用紧凑模型和混合训练策略可以实现高精度的识别，从而提高了基于咳嗽的PID的状态。它将咳嗽声音确立为一种实用而独特的生物识别方式，在安全、用户身份验证和健康监测方面具有很好的应用前景，特别是在基于语音的系统不太可靠或不可行的环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊