结合高阶统计量和高阶微分能量算子的机器学习病理语音检测方法

Jihye Moon, Sanghun Kim
{"title":"结合高阶统计量和高阶微分能量算子的机器学习病理语音检测方法","authors":"Jihye Moon, Sanghun Kim","doi":"10.1109/ICTC.2018.8539495","DOIUrl":null,"url":null,"abstract":"Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.","PeriodicalId":417962,"journal":{"name":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"368 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An approach on a combination of higher-order statistics and higher-order differential energy operator for detecting pathological voice with machine learning\",\"authors\":\"Jihye Moon, Sanghun Kim\",\"doi\":\"10.1109/ICTC.2018.8539495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.\",\"PeriodicalId\":417962,\"journal\":{\"name\":\"2018 International Conference on Information and Communication Technology Convergence (ICTC)\",\"volume\":\"368 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Information and Communication Technology Convergence (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTC.2018.8539495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC.2018.8539495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

语音信号是发现神经紊乱和肌肉功能障碍等疾病进展的指标。为了提高基于语音信号的医疗诊断系统的性能,提出了一种将高阶统计量(HOS)与高阶微分能量算子(DEO)相结合的特征提取方法。实验采用Saarbruecken语音数据库(SVD),选取687份健康语音样本和263份由囊肿、麻痹和息肉组成的病理语音样本。此外,OpenSmile脚本提供了6373个特性,并与我们的新特性进行了比较。为了确定最有效的特征,梯度增强作为特征选择器进行。最后,选取了包括15种HOS和DEO组合在内的20个特征,并利用这些特征训练深度神经网络。获得的最佳准确率为87.4%,超过了现有特征的最佳准确率84.5%。这一发现表明,不需要卷积神经网络等繁重的计算,只需要统计信息就可以有效地检测出病态声音。由于结构简单,我们期望这种方法可以很容易地应用于各种移动系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An approach on a combination of higher-order statistics and higher-order differential energy operator for detecting pathological voice with machine learning
Voice signal is an indicator finding a progression of diseases such as nerve disorder and muscle dysfunction. To improve the performance of medical diagnosis system using the voice signal, this paper suggests a new feature extraction method which combines higher-order statistics (HOS) and higher-order differential energy operator (DEO). For the experiment, Saarbruecken Voice Database (SVD) was used, and 687 healthy voice samples and 263 pathological voice samples which consist of Cysts, Paralysis, and Polyp were selected. In addition, the OpenSmile script which provides 6,373 features was used for comparison with our new features. To decide the most effective features, Gradient Boosting was conducted as a feature selector. Finally, 20 features including 15 combinations of HOS and DEO were chosen, and deep neural network(DNN) was trained using the new features. The best accuracy of 87.4% was obtained, which exceeds the best accuracy of 84.5% with the existing features. The finding suggests a possibility that the pathological voice can be efficiently detected with only statistical information without heavy computations such as convolutional neural networks. Due to the simple structure, we expect this approach will be easily applied to a variety of mobile systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信