Pratiksha Sahane, S. Pangaonkar, Shridhar Khandekar
{"title":"基于多锥度倒谱系数的困难语音识别","authors":"Pratiksha Sahane, S. Pangaonkar, Shridhar Khandekar","doi":"10.1109/CCGE50943.2021.9776318","DOIUrl":null,"url":null,"abstract":"Vast industrial growth has increased the demand of automatic speech recognition for various automation and human machine interaction application. Performance of various artificial intelligence based approaches is limited because of the speech disability caused due to communication disorders, neurogenic speech disorder or psychological speech disorders. The dysarthric disorder is neurogenic speech disorder that limits the human voice articulation capability. This paper presents, dysarthric speech detection using Multi-Taper Mel Frequency Cepstral coefficients (MTMFCC) that is capable to smallest variation over the dysarthric speech. The efficiency of the proposed algorithm is estimated using the K-Nearest Neighbor (KNN) classifier and support vector machine (SVM) based on accuracy, sensitivity and specificity. The system has shown 99.04 % and 96.00 % accuracy for the MTMFCC+KNN and MTMFCC+SVM which is superior to traditional MFCC.","PeriodicalId":130452,"journal":{"name":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dysarthric Speech Recognition using Multi-Taper Mel Frequency Cepstrum Coefficients\",\"authors\":\"Pratiksha Sahane, S. Pangaonkar, Shridhar Khandekar\",\"doi\":\"10.1109/CCGE50943.2021.9776318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vast industrial growth has increased the demand of automatic speech recognition for various automation and human machine interaction application. Performance of various artificial intelligence based approaches is limited because of the speech disability caused due to communication disorders, neurogenic speech disorder or psychological speech disorders. The dysarthric disorder is neurogenic speech disorder that limits the human voice articulation capability. This paper presents, dysarthric speech detection using Multi-Taper Mel Frequency Cepstral coefficients (MTMFCC) that is capable to smallest variation over the dysarthric speech. The efficiency of the proposed algorithm is estimated using the K-Nearest Neighbor (KNN) classifier and support vector machine (SVM) based on accuracy, sensitivity and specificity. The system has shown 99.04 % and 96.00 % accuracy for the MTMFCC+KNN and MTMFCC+SVM which is superior to traditional MFCC.\",\"PeriodicalId\":130452,\"journal\":{\"name\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGE50943.2021.9776318\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGE50943.2021.9776318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dysarthric Speech Recognition using Multi-Taper Mel Frequency Cepstrum Coefficients
Vast industrial growth has increased the demand of automatic speech recognition for various automation and human machine interaction application. Performance of various artificial intelligence based approaches is limited because of the speech disability caused due to communication disorders, neurogenic speech disorder or psychological speech disorders. The dysarthric disorder is neurogenic speech disorder that limits the human voice articulation capability. This paper presents, dysarthric speech detection using Multi-Taper Mel Frequency Cepstral coefficients (MTMFCC) that is capable to smallest variation over the dysarthric speech. The efficiency of the proposed algorithm is estimated using the K-Nearest Neighbor (KNN) classifier and support vector machine (SVM) based on accuracy, sensitivity and specificity. The system has shown 99.04 % and 96.00 % accuracy for the MTMFCC+KNN and MTMFCC+SVM which is superior to traditional MFCC.