PERFORMANCE ENHANCEMENT OF DEEP NEURAL NETWORK BASED AUTOMATIC VOICE DISORDER DETECTION SYSTEM WITH DATA AUGMENTATION — DETECTION OF LEUKOPLAKIA: A CASE STUDY
D. K. Thennal, Vrinda V. Nair, R. Indudharan, D. Gopinath
{"title":"PERFORMANCE ENHANCEMENT OF DEEP NEURAL NETWORK BASED AUTOMATIC VOICE DISORDER DETECTION SYSTEM WITH DATA AUGMENTATION — DETECTION OF LEUKOPLAKIA: A CASE STUDY","authors":"D. K. Thennal, Vrinda V. Nair, R. Indudharan, D. Gopinath","doi":"10.4015/s1016237222500417","DOIUrl":null,"url":null,"abstract":"Laryngeal pathologies resulting in voice disorders are normally diagnosed using invasive methods such as rigid laryngoscopy, flexible nasopharyngo-laryngoscopy and stroboscopy, which are expensive, time-consuming and often inconvenient to patients. Automatic Voice Disorder Detection (AVDD) systems are used for non-invasive screening to give an indicative direction to the physician as a preliminary diagnosis. Deep neural networks, known for their superior discrimination capabilities, can be used for AVDD Systems, provided there are sufficient samples for training. The most popular datasets used for developing AVDD systems lack sufficient samples in several pathological categories. Leukoplakia — a premalignant lesion, which may progress to carcinoma unless detected early — is one such pathology. Data augmentation is a technique used in deep learning environments to increase the size of the training datasets which lack sufficient samples for effective data analysis and classification. This study aims at investigating the performance enhancement of a deep learning-based AVDD system through a novel time domain data augmentation technique named ‘TempAug’. This method segments each data sample into short voice segments, so as to get multiple data from each sample, thereby generating a larger database (augmented database) for training a deep learning model. A deep neural network model, Long Short-Term Memory (LSTM) with Short Term Fourier Transform (STFT) coefficients as input features for classification, was used in this study for the detection of the voice disorder Leukoplakia. A series of experiments were done to investigate the effect of data augmentation and to find the optimum duration for segmentation. Based on experimental results, a detection strategy was developed and evaluated using an AVDD system, which gave an accuracy of 81.25%. The percentage increase in accuracy was found to be 46.9% with respect to the accuracy obtained for unaugmented data.","PeriodicalId":8862,"journal":{"name":"Biomedical Engineering: Applications, Basis and Communications","volume":"10 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering: Applications, Basis and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4015/s1016237222500417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Laryngeal pathologies resulting in voice disorders are normally diagnosed using invasive methods such as rigid laryngoscopy, flexible nasopharyngo-laryngoscopy and stroboscopy, which are expensive, time-consuming and often inconvenient to patients. Automatic Voice Disorder Detection (AVDD) systems are used for non-invasive screening to give an indicative direction to the physician as a preliminary diagnosis. Deep neural networks, known for their superior discrimination capabilities, can be used for AVDD Systems, provided there are sufficient samples for training. The most popular datasets used for developing AVDD systems lack sufficient samples in several pathological categories. Leukoplakia — a premalignant lesion, which may progress to carcinoma unless detected early — is one such pathology. Data augmentation is a technique used in deep learning environments to increase the size of the training datasets which lack sufficient samples for effective data analysis and classification. This study aims at investigating the performance enhancement of a deep learning-based AVDD system through a novel time domain data augmentation technique named ‘TempAug’. This method segments each data sample into short voice segments, so as to get multiple data from each sample, thereby generating a larger database (augmented database) for training a deep learning model. A deep neural network model, Long Short-Term Memory (LSTM) with Short Term Fourier Transform (STFT) coefficients as input features for classification, was used in this study for the detection of the voice disorder Leukoplakia. A series of experiments were done to investigate the effect of data augmentation and to find the optimum duration for segmentation. Based on experimental results, a detection strategy was developed and evaluated using an AVDD system, which gave an accuracy of 81.25%. The percentage increase in accuracy was found to be 46.9% with respect to the accuracy obtained for unaugmented data.
期刊介绍:
Biomedical Engineering: Applications, Basis and Communications is an international, interdisciplinary journal aiming at publishing up-to-date contributions on original clinical and basic research in the biomedical engineering. Research of biomedical engineering has grown tremendously in the past few decades. Meanwhile, several outstanding journals in the field have emerged, with different emphases and objectives. We hope this journal will serve as a new forum for both scientists and clinicians to share their ideas and the results of their studies.
Biomedical Engineering: Applications, Basis and Communications explores all facets of biomedical engineering, with emphasis on both the clinical and scientific aspects of the study. It covers the fields of bioelectronics, biomaterials, biomechanics, bioinformatics, nano-biological sciences and clinical engineering. The journal fulfils this aim by publishing regular research / clinical articles, short communications, technical notes and review papers. Papers from both basic research and clinical investigations will be considered.