Víctor Soto, O. Siohan, Mohamed G. Elfeky, P. Moreno
{"title":"Selection and combination of hypotheses for dialectal speech recognition","authors":"Víctor Soto, O. Siohan, Mohamed G. Elfeky, P. Moreno","doi":"10.1109/ICASSP.2016.7472798","DOIUrl":null,"url":null,"abstract":"While research has often shown that building dialect-specific Automatic Speech Recognizers is the optimal approach to dealing with dialectal variations of the same language, we have observed that dialect-specific recognizers do not always output the best recognitions. Often enough, another dialectal recognizer outputs a better recognition than the dialect-specific one. In this paper, we present two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers. We follow a Machine Learning approach and extract features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification. Our experiments using Dictation and Voice Search data from the main four Arabic dialects show good WER improvements for the hypothesis selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and promising results for the hypotheses combination scheme.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
While research has often shown that building dialect-specific Automatic Speech Recognizers is the optimal approach to dealing with dialectal variations of the same language, we have observed that dialect-specific recognizers do not always output the best recognitions. Often enough, another dialectal recognizer outputs a better recognition than the dialect-specific one. In this paper, we present two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers. We follow a Machine Learning approach and extract features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification. Our experiments using Dictation and Voice Search data from the main four Arabic dialects show good WER improvements for the hypothesis selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and promising results for the hypotheses combination scheme.