{"title":"A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal","authors":"Ranita Khumukcham, Kishorjit Nongmeikapam","doi":"10.1007/s11042-024-20067-4","DOIUrl":null,"url":null,"abstract":"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \"Combinatorial Transformative Scalogram (CTS).\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>\\(\\%\\)</span> with a sensitivity of 93.85<span>\\(\\%\\)</span> and a specificity of 97.96<span>\\(\\%\\)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20067-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the "Combinatorial Transformative Scalogram (CTS)." The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12\(\%\) with a sensitivity of 93.85\(\%\) and a specificity of 97.96\(\%\) against other existing techniques. The current method performed well despite the data imbalance that exists between classes.
期刊介绍:
Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed.
Specific areas of interest include:
- Multimedia Tools:
- Multimedia Applications:
- Prototype multimedia systems and platforms