结合高阶统计量和高阶微分能量算子的机器学习病理语音检测方法

2018 International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2018-10-01 DOI:10.1109/ICTC.2018.8539495

Jihye Moon, Sanghun Kim

{"title":"结合高阶统计量和高阶微分能量算子的机器学习病理语音检测方法","authors":"Jihye Moon, Sanghun Kim","doi":"10.1109/ICTC.2018.8539495","DOIUrl":null,"url":null,"abstract":"Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.","PeriodicalId":417962,"journal":{"name":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"368 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An approach on a combination of higher-order statistics and higher-order differential energy operator for detecting pathological voice with machine learning\",\"authors\":\"Jihye Moon, Sanghun Kim\",\"doi\":\"10.1109/ICTC.2018.8539495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.\",\"PeriodicalId\":417962,\"journal\":{\"name\":\"2018 International Conference on Information and Communication Technology Convergence (ICTC)\",\"volume\":\"368 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Information and Communication Technology Convergence (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTC.2018.8539495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC.2018.8539495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

语音信号是发现神经紊乱和肌肉功能障碍等疾病进展的指标。为了提高基于语音信号的医疗诊断系统的性能，提出了一种将高阶统计量(HOS)与高阶微分能量算子(DEO)相结合的特征提取方法。实验采用Saarbruecken语音数据库(SVD)，选取687份健康语音样本和263份由囊肿、麻痹和息肉组成的病理语音样本。此外，OpenSmile脚本提供了6373个特性，并与我们的新特性进行了比较。为了确定最有效的特征，梯度增强作为特征选择器进行。最后，选取了包括15种HOS和DEO组合在内的20个特征，并利用这些特征训练深度神经网络。获得的最佳准确率为87.4%，超过了现有特征的最佳准确率84.5%。这一发现表明，不需要卷积神经网络等繁重的计算，只需要统计信息就可以有效地检测出病态声音。由于结构简单，我们期望这种方法可以很容易地应用于各种移动系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An approach on a combination of higher-order statistics and higher-order differential energy operator for detecting pathological voice with machine learning

Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量