{"title":"Speech recognition of multiple accented English data using acoustic model interpolation","authors":"Thiago Fraga-Silva, J. Gauvain, L. Lamel","doi":"10.5281/ZENODO.44197","DOIUrl":null,"url":null,"abstract":"In a previous work [1], we have shown that model interpolation can be applied for acoustic model adaptation for a specific show. Compared to other approaches, this method has the advantage to be highly flexible, allowing rapid adaptation by simply reassigning the interpolation coefficients. In this work this approach is used for a multi-accented English broadcast news data recognition, which can be considered an arduous task due to the impact of accent variability on the recognition performance. The work described in [1] is extended in two ways. First, in order to reduce the parameters of the interpolated model, a theoretically motivated EM-like mixture reduction algorithm is proposed. Second, beyond supervised adaptation, model interpolation is used as an unsupervised adaptation framework, where the interpolation coefficients are estimated on-the-fly for each test segment.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.44197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In a previous work [1], we have shown that model interpolation can be applied for acoustic model adaptation for a specific show. Compared to other approaches, this method has the advantage to be highly flexible, allowing rapid adaptation by simply reassigning the interpolation coefficients. In this work this approach is used for a multi-accented English broadcast news data recognition, which can be considered an arduous task due to the impact of accent variability on the recognition performance. The work described in [1] is extended in two ways. First, in order to reduce the parameters of the interpolated model, a theoretically motivated EM-like mixture reduction algorithm is proposed. Second, beyond supervised adaptation, model interpolation is used as an unsupervised adaptation framework, where the interpolation coefficients are estimated on-the-fly for each test segment.