{"title":"Progress on automatic annotation of speech corpora using complementary ASR systems","authors":"Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu","doi":"10.1109/TSP.2019.8769087","DOIUrl":null,"url":null,"abstract":"Deep learning techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition (ASR). Corporate giants, such as Google or IBM, train English ASR systems on more than 100k hours of annotated speech, while research on under-resourced languages, such as Romanian, has to deal with as little as 300 hours. In this context, automatic annotation of speech corpora and unsupervised acoustic model training are promising directions to be explored to leverage the lack of data. This study describes the progress made by SpeeD laboratory in this research direction: using an already proven methodology, applying it on large scale (more than 700 hours of unlabeled speech) and analyzing in-depth the experimental results to identify potential future directions. Moreover, we present novel results on Romanian ASR: the methodology leads to a relative Word Error Rate (WER) improvement up to almost 10%.","PeriodicalId":399087,"journal":{"name":"2019 42nd International Conference on Telecommunications and Signal Processing (TSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 42nd International Conference on Telecommunications and Signal Processing (TSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSP.2019.8769087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Deep learning techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition (ASR). Corporate giants, such as Google or IBM, train English ASR systems on more than 100k hours of annotated speech, while research on under-resourced languages, such as Romanian, has to deal with as little as 300 hours. In this context, automatic annotation of speech corpora and unsupervised acoustic model training are promising directions to be explored to leverage the lack of data. This study describes the progress made by SpeeD laboratory in this research direction: using an already proven methodology, applying it on large scale (more than 700 hours of unlabeled speech) and analyzing in-depth the experimental results to identify potential future directions. Moreover, we present novel results on Romanian ASR: the methodology leads to a relative Word Error Rate (WER) improvement up to almost 10%.