{"title":"基于几何方法的组合变换 Scalogram 分析法,用于对语音信号中的病变进行多类识别","authors":"Ranita Khumukcham, Kishorjit Nongmeikapam","doi":"10.1007/s11042-024-20067-4","DOIUrl":null,"url":null,"abstract":"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \"Combinatorial Transformative Scalogram (CTS).\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>\\(\\%\\)</span> with a sensitivity of 93.85<span>\\(\\%\\)</span> and a specificity of 97.96<span>\\(\\%\\)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal\",\"authors\":\"Ranita Khumukcham, Kishorjit Nongmeikapam\",\"doi\":\"10.1007/s11042-024-20067-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \\\"Combinatorial Transformative Scalogram (CTS).\\\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>\\\\(\\\\%\\\\)</span> with a sensitivity of 93.85<span>\\\\(\\\\%\\\\)</span> and a specificity of 97.96<span>\\\\(\\\\%\\\\)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>\",\"PeriodicalId\":18770,\"journal\":{\"name\":\"Multimedia Tools and Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Tools and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11042-024-20067-4\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20067-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
许多研究人员倾向于采用非侵入式技术,通过从语音信号中提取特征描述符来训练机器学习算法,从而准确识别声道生理异常的类型。然而,迄今为止,大多数技术仅限于对声音正常或异常进行分类。至关重要的是,经过训练的人工智能(AI)必须能够识别与嗓音相关的确切病理,以便在现实环境中实施。另一个问题是需要抑制可能与语音频谱混杂在一起的环境噪音。目前的工作提出了一种稳健、耗时较少且非侵入性的技术,用于识别与喉部声音信号相关的病理。更具体地说,对输入的语音信号采用了两阶段信号滤波方法,包括基于评分的几何方法和声门反滤波方法。其目的是估计噪声频谱,重新生成干净的信号,最后提供完全基本的声门流量衍生信号。在下一阶段,干净的声门导数信号被用于形成一种新的融合声谱图,这种声谱图目前被称为 "组合变换声谱图(CTS)"。CTS 是一个时频域图,由两个时频频谱图组合而成。在研究性能时使用了九个分类指标,分别是:灵敏度、平均准确度、误差、精确度、假阳性率、特异性、科恩卡帕(Cohen's kappa)、马修斯相关系数(Matthews Correlation Coefficient)和 F1 分数。与其他现有技术相比,使用 VOice ICar fEDerico II (VOICED) 标准数据库的平均准确率最高,为 94.12%,灵敏度为 93.85%,特异性为 97.96%。尽管类与类之间存在数据不平衡,但目前的方法表现良好。
A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal
Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the "Combinatorial Transformative Scalogram (CTS)." The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12\(\%\) with a sensitivity of 93.85\(\%\) and a specificity of 97.96\(\%\) against other existing techniques. The current method performed well despite the data imbalance that exists between classes.
期刊介绍:
Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed.
Specific areas of interest include:
- Multimedia Tools:
- Multimedia Applications:
- Prototype multimedia systems and platforms