基于数据增强的深度神经网络语音障碍自动检测系统性能增强——以白斑检测为例

IF 0.6 Q4 ENGINEERING, BIOMEDICAL
D. K. Thennal, Vrinda V. Nair, R. Indudharan, D. Gopinath
{"title":"基于数据增强的深度神经网络语音障碍自动检测系统性能增强——以白斑检测为例","authors":"D. K. Thennal, Vrinda V. Nair, R. Indudharan, D. Gopinath","doi":"10.4015/s1016237222500417","DOIUrl":null,"url":null,"abstract":"Laryngeal pathologies resulting in voice disorders are normally diagnosed using invasive methods such as rigid laryngoscopy, flexible nasopharyngo-laryngoscopy and stroboscopy, which are expensive, time-consuming and often inconvenient to patients. Automatic Voice Disorder Detection (AVDD) systems are used for non-invasive screening to give an indicative direction to the physician as a preliminary diagnosis. Deep neural networks, known for their superior discrimination capabilities, can be used for AVDD Systems, provided there are sufficient samples for training. The most popular datasets used for developing AVDD systems lack sufficient samples in several pathological categories. Leukoplakia — a premalignant lesion, which may progress to carcinoma unless detected early — is one such pathology. Data augmentation is a technique used in deep learning environments to increase the size of the training datasets which lack sufficient samples for effective data analysis and classification. This study aims at investigating the performance enhancement of a deep learning-based AVDD system through a novel time domain data augmentation technique named ‘TempAug’. This method segments each data sample into short voice segments, so as to get multiple data from each sample, thereby generating a larger database (augmented database) for training a deep learning model. A deep neural network model, Long Short-Term Memory (LSTM) with Short Term Fourier Transform (STFT) coefficients as input features for classification, was used in this study for the detection of the voice disorder Leukoplakia. A series of experiments were done to investigate the effect of data augmentation and to find the optimum duration for segmentation. Based on experimental results, a detection strategy was developed and evaluated using an AVDD system, which gave an accuracy of 81.25%. The percentage increase in accuracy was found to be 46.9% with respect to the accuracy obtained for unaugmented data.","PeriodicalId":8862,"journal":{"name":"Biomedical Engineering: Applications, Basis and Communications","volume":"10 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PERFORMANCE ENHANCEMENT OF DEEP NEURAL NETWORK BASED AUTOMATIC VOICE DISORDER DETECTION SYSTEM WITH DATA AUGMENTATION — DETECTION OF LEUKOPLAKIA: A CASE STUDY\",\"authors\":\"D. K. Thennal, Vrinda V. Nair, R. Indudharan, D. Gopinath\",\"doi\":\"10.4015/s1016237222500417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Laryngeal pathologies resulting in voice disorders are normally diagnosed using invasive methods such as rigid laryngoscopy, flexible nasopharyngo-laryngoscopy and stroboscopy, which are expensive, time-consuming and often inconvenient to patients. Automatic Voice Disorder Detection (AVDD) systems are used for non-invasive screening to give an indicative direction to the physician as a preliminary diagnosis. Deep neural networks, known for their superior discrimination capabilities, can be used for AVDD Systems, provided there are sufficient samples for training. The most popular datasets used for developing AVDD systems lack sufficient samples in several pathological categories. Leukoplakia — a premalignant lesion, which may progress to carcinoma unless detected early — is one such pathology. Data augmentation is a technique used in deep learning environments to increase the size of the training datasets which lack sufficient samples for effective data analysis and classification. This study aims at investigating the performance enhancement of a deep learning-based AVDD system through a novel time domain data augmentation technique named ‘TempAug’. This method segments each data sample into short voice segments, so as to get multiple data from each sample, thereby generating a larger database (augmented database) for training a deep learning model. A deep neural network model, Long Short-Term Memory (LSTM) with Short Term Fourier Transform (STFT) coefficients as input features for classification, was used in this study for the detection of the voice disorder Leukoplakia. A series of experiments were done to investigate the effect of data augmentation and to find the optimum duration for segmentation. Based on experimental results, a detection strategy was developed and evaluated using an AVDD system, which gave an accuracy of 81.25%. The percentage increase in accuracy was found to be 46.9% with respect to the accuracy obtained for unaugmented data.\",\"PeriodicalId\":8862,\"journal\":{\"name\":\"Biomedical Engineering: Applications, Basis and Communications\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Engineering: Applications, Basis and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4015/s1016237222500417\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering: Applications, Basis and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4015/s1016237222500417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

摘要

导致声音障碍的喉部病变通常采用硬喉镜、柔性鼻咽喉镜和频闪镜等侵入性方法进行诊断,这些方法成本高、耗时长,而且往往给患者带来不便。自动语音障碍检测(AVDD)系统用于非侵入性筛查,为医生提供指示性指导,作为初步诊断。深度神经网络以其卓越的识别能力而闻名,只要有足够的样本进行训练,就可以用于AVDD系统。用于开发AVDD系统的最流行的数据集在几个病理类别中缺乏足够的样本。白斑——一种癌前病变,除非早期发现,否则可能发展为癌——就是这样一种病理。数据增强是一种用于深度学习环境的技术,用于增加缺乏足够样本的训练数据集的大小,以进行有效的数据分析和分类。本研究旨在通过一种名为“TempAug”的新型时域数据增强技术来研究基于深度学习的AVDD系统的性能增强。该方法将每个数据样本分割成短的语音片段,从而从每个样本中获得多个数据,从而生成更大的数据库(增强数据库),用于训练深度学习模型。本研究采用长短期记忆(LSTM)深度神经网络模型,以短期傅里叶变换(STFT)系数作为输入特征进行分类,用于语音障碍白斑的检测。通过一系列的实验研究了数据增强的效果,并找到了最佳的分割时间。基于实验结果,开发了一种基于AVDD系统的检测策略,并对其进行了评估,准确率达到81.25%。与未增强数据获得的准确性相比,准确度增加了46.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PERFORMANCE ENHANCEMENT OF DEEP NEURAL NETWORK BASED AUTOMATIC VOICE DISORDER DETECTION SYSTEM WITH DATA AUGMENTATION — DETECTION OF LEUKOPLAKIA: A CASE STUDY
Laryngeal pathologies resulting in voice disorders are normally diagnosed using invasive methods such as rigid laryngoscopy, flexible nasopharyngo-laryngoscopy and stroboscopy, which are expensive, time-consuming and often inconvenient to patients. Automatic Voice Disorder Detection (AVDD) systems are used for non-invasive screening to give an indicative direction to the physician as a preliminary diagnosis. Deep neural networks, known for their superior discrimination capabilities, can be used for AVDD Systems, provided there are sufficient samples for training. The most popular datasets used for developing AVDD systems lack sufficient samples in several pathological categories. Leukoplakia — a premalignant lesion, which may progress to carcinoma unless detected early — is one such pathology. Data augmentation is a technique used in deep learning environments to increase the size of the training datasets which lack sufficient samples for effective data analysis and classification. This study aims at investigating the performance enhancement of a deep learning-based AVDD system through a novel time domain data augmentation technique named ‘TempAug’. This method segments each data sample into short voice segments, so as to get multiple data from each sample, thereby generating a larger database (augmented database) for training a deep learning model. A deep neural network model, Long Short-Term Memory (LSTM) with Short Term Fourier Transform (STFT) coefficients as input features for classification, was used in this study for the detection of the voice disorder Leukoplakia. A series of experiments were done to investigate the effect of data augmentation and to find the optimum duration for segmentation. Based on experimental results, a detection strategy was developed and evaluated using an AVDD system, which gave an accuracy of 81.25%. The percentage increase in accuracy was found to be 46.9% with respect to the accuracy obtained for unaugmented data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biomedical Engineering: Applications, Basis and Communications
Biomedical Engineering: Applications, Basis and Communications Biochemistry, Genetics and Molecular Biology-Biophysics
CiteScore
1.50
自引率
11.10%
发文量
36
审稿时长
4 months
期刊介绍: Biomedical Engineering: Applications, Basis and Communications is an international, interdisciplinary journal aiming at publishing up-to-date contributions on original clinical and basic research in the biomedical engineering. Research of biomedical engineering has grown tremendously in the past few decades. Meanwhile, several outstanding journals in the field have emerged, with different emphases and objectives. We hope this journal will serve as a new forum for both scientists and clinicians to share their ideas and the results of their studies. Biomedical Engineering: Applications, Basis and Communications explores all facets of biomedical engineering, with emphasis on both the clinical and scientific aspects of the study. It covers the fields of bioelectronics, biomaterials, biomechanics, bioinformatics, nano-biological sciences and clinical engineering. The journal fulfils this aim by publishing regular research / clinical articles, short communications, technical notes and review papers. Papers from both basic research and clinical investigations will be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信