基于对抗病理反应(APR)网络深度学习模型的语音病理诊断智能系统:基于深度学习的语音病理诊断智能系统

Int. J. Softw. Innov. Pub Date : 2022-01-01 DOI:10.4018/ijsi.312261

Vikas Mittal, R. Sharma

{"title":"基于对抗病理反应(APR)网络深度学习模型的语音病理诊断智能系统:基于深度学习的语音病理诊断智能系统","authors":"Vikas Mittal, R. Sharma","doi":"10.4018/ijsi.312261","DOIUrl":null,"url":null,"abstract":"The work investigates the use of two types of glottal flow derivative-based image variants of the input signal with an n-dilated (nD)-inception-layers-based deep learning model for providing optimal labels. The authors have proposed an n-dilated (nD) inception layer-based adversarial pathological response (APR) net deep learning model. This model is trained using the two image databases separately in an adversarial manner so that when a test image is common to test image is applied to both the networks. The results show a mean accuracy of 96.82%, 96.36%, and 99.35% for the Glottal inverse filtering with extended Kalman Filter-Morse scalogram (GIFEKF-MS) APRNet, Glottal inverse filtering with extended Kalman Filter-spectrogram (GIFEKF-S) APRNet, and proposed APR fusion net respectively using the VOice ICar fEDerico II (VOICED) dataset; and mean accuracies 95.67%, 93.27%, and 99.04% for the GIFEKF-MS APRNet, GIFEKF-S APRNet, and proposed APR fusion net respectively using the Saarbrucken voice database (SVD)dataset.","PeriodicalId":396598,"journal":{"name":"Int. J. Softw. Innov.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Intelligent System for the Diagnosis of Voice Pathology Based on Adversarial Pathological Response (APR) Net Deep Learning Model: An Intelligent System for the Diagnosis of Voice Pathology-Based Deep Learning\",\"authors\":\"Vikas Mittal, R. Sharma\",\"doi\":\"10.4018/ijsi.312261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The work investigates the use of two types of glottal flow derivative-based image variants of the input signal with an n-dilated (nD)-inception-layers-based deep learning model for providing optimal labels. The authors have proposed an n-dilated (nD) inception layer-based adversarial pathological response (APR) net deep learning model. This model is trained using the two image databases separately in an adversarial manner so that when a test image is common to test image is applied to both the networks. The results show a mean accuracy of 96.82%, 96.36%, and 99.35% for the Glottal inverse filtering with extended Kalman Filter-Morse scalogram (GIFEKF-MS) APRNet, Glottal inverse filtering with extended Kalman Filter-spectrogram (GIFEKF-S) APRNet, and proposed APR fusion net respectively using the VOice ICar fEDerico II (VOICED) dataset; and mean accuracies 95.67%, 93.27%, and 99.04% for the GIFEKF-MS APRNet, GIFEKF-S APRNet, and proposed APR fusion net respectively using the Saarbrucken voice database (SVD)dataset.\",\"PeriodicalId\":396598,\"journal\":{\"name\":\"Int. J. Softw. Innov.\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Softw. Innov.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijsi.312261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Softw. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijsi.312261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

这项工作研究了输入信号的两种类型的基于声门流导数的图像变体的使用，以及基于n扩展(nD)初始层的深度学习模型，以提供最佳标签。作者提出了一种基于n扩张(nD)初始层的对抗性病理反应(APR)网络深度学习模型。该模型以对抗的方式分别使用两个图像数据库进行训练，以便当一个测试图像是通用的时，将测试图像应用于两个网络。结果表明:扩展卡尔曼滤波-莫尔斯尺度图(GIFEKF-MS) APRNet、扩展卡尔曼滤波-谱图(GIFEKF-S) APRNet和基于VOice ICar fEDerico II (voicar fEDerico II)数据集的声门反滤波和提出的APR融合网络的平均准确率分别为96.82%、96.36%和99.35%;使用Saarbrucken语音数据库(SVD)数据集构建的GIFEKF-MS APRNet、GIFEKF-S APRNet和提出的APR融合网络的平均准确率分别为95.67%、93.27%和99.04%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Intelligent System for the Diagnosis of Voice Pathology Based on Adversarial Pathological Response (APR) Net Deep Learning Model: An Intelligent System for the Diagnosis of Voice Pathology-Based Deep Learning

The work investigates the use of two types of glottal flow derivative-based image variants of the input signal with an n-dilated (nD)-inception-layers-based deep learning model for providing optimal labels. The authors have proposed an n-dilated (nD) inception layer-based adversarial pathological response (APR) net deep learning model. This model is trained using the two image databases separately in an adversarial manner so that when a test image is common to test image is applied to both the networks. The results show a mean accuracy of 96.82%, 96.36%, and 99.35% for the Glottal inverse filtering with extended Kalman Filter-Morse scalogram (GIFEKF-MS) APRNet, Glottal inverse filtering with extended Kalman Filter-spectrogram (GIFEKF-S) APRNet, and proposed APR fusion net respectively using the VOice ICar fEDerico II (VOICED) dataset; and mean accuracies 95.67%, 93.27%, and 99.04% for the GIFEKF-MS APRNet, GIFEKF-S APRNet, and proposed APR fusion net respectively using the Saarbrucken voice database (SVD)dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Softw. Innov.

自引率

0.00%

发文量