{"title":"Acoustic cues for person identification using cough sounds","authors":"Van-Thuan Tran, Ting-Hao You, Wei-Ho Tsai","doi":"10.1016/j.cmpbup.2025.100195","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.</div></div><div><h3>Methods</h3><div>We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.</div></div><div><h3>Results</h3><div>Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.</div></div><div><h3>Conclusion</h3><div>This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100195"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990025000199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.
Methods
We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.
Results
Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.
Conclusion
This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.