{"title":"基于Turbo融合的上下文无关手机识别新极限基准","authors":"Timo Lohrenz, Wei Li, T. Fingscheidt","doi":"10.1109/SLT.2018.8639670","DOIUrl":null,"url":null,"abstract":"In this work, we apply the recently proposed turbo fusion in conjunction with state-of-the-art convolutional neural networks as acoustic models to the standard phone recognition task on the TIMIT database. The turbo fusion operates on posterior streams stemming from standard filterbank features and from group delay (phase) features. By the iterative exchange of posterior information, the phone error rate is decreased down to 16.91% absolute, which is to our knowledge the best reported result on the TIMIT core test set so far using context-independent acoustic models, outperforming the previous respective benchmark by 4.4% relative.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A New Timit Benchmark for Context-Independent Phone Recognition Using Turbo Fusion\",\"authors\":\"Timo Lohrenz, Wei Li, T. Fingscheidt\",\"doi\":\"10.1109/SLT.2018.8639670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we apply the recently proposed turbo fusion in conjunction with state-of-the-art convolutional neural networks as acoustic models to the standard phone recognition task on the TIMIT database. The turbo fusion operates on posterior streams stemming from standard filterbank features and from group delay (phase) features. By the iterative exchange of posterior information, the phone error rate is decreased down to 16.91% absolute, which is to our knowledge the best reported result on the TIMIT core test set so far using context-independent acoustic models, outperforming the previous respective benchmark by 4.4% relative.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Timit Benchmark for Context-Independent Phone Recognition Using Turbo Fusion
In this work, we apply the recently proposed turbo fusion in conjunction with state-of-the-art convolutional neural networks as acoustic models to the standard phone recognition task on the TIMIT database. The turbo fusion operates on posterior streams stemming from standard filterbank features and from group delay (phase) features. By the iterative exchange of posterior information, the phone error rate is decreased down to 16.91% absolute, which is to our knowledge the best reported result on the TIMIT core test set so far using context-independent acoustic models, outperforming the previous respective benchmark by 4.4% relative.