小词汇语音识别的大词汇工具探索

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373263

Tara N. Sainath, B. Ramabhadran, M. Picheny

{"title":"小词汇语音识别的大词汇工具探索","authors":"Tara N. Sainath, B. Ramabhadran, M. Picheny","doi":"10.1109/ASRU.2009.5373263","DOIUrl":null,"url":null,"abstract":"While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers from two main drawbacks. First, because of the large number of parameters and poorly labeled transcriptions, gaining insight into further improvements based on error analysis is very difficult. Second, LVCSR systems often take a significantly longer time to train and test new research ideas compared to small vocabulary tasks. A small vocabulary task like TIMIT provides a phonetically rich and hand-labeled corpus and offers a good test bed to study algorithmic improvements. However, oftentimes research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we address these issues by taking the standard Ȝrecipeȝ used in typical LVCSR systems and applying it to the TIMIT phonetic recognition corpus, which provides a standard benchmark to compare methods. We find that at the speaker-independent (SI) level, our results offer comparable performance to other SI HMM systems. By taking advantage of speaker adaptation and discriminative training techniques commonly used in LVCSR systems, we achieve an error rate of 20%, the best results reported on the TIMIT task to date, moving us closer to the human reported phonetic recognition error rate of 15%. We propose the use of this system as the baseline for future research and believe that it will serve as a good framework to explore ideas that will carry over to LVCSR systems.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"An exploration of large vocabulary tools for small vocabulary phonetic recognition\",\"authors\":\"Tara N. Sainath, B. Ramabhadran, M. Picheny\",\"doi\":\"10.1109/ASRU.2009.5373263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers from two main drawbacks. First, because of the large number of parameters and poorly labeled transcriptions, gaining insight into further improvements based on error analysis is very difficult. Second, LVCSR systems often take a significantly longer time to train and test new research ideas compared to small vocabulary tasks. A small vocabulary task like TIMIT provides a phonetically rich and hand-labeled corpus and offers a good test bed to study algorithmic improvements. However, oftentimes research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we address these issues by taking the standard Ȝrecipeȝ used in typical LVCSR systems and applying it to the TIMIT phonetic recognition corpus, which provides a standard benchmark to compare methods. We find that at the speaker-independent (SI) level, our results offer comparable performance to other SI HMM systems. By taking advantage of speaker adaptation and discriminative training techniques commonly used in LVCSR systems, we achieve an error rate of 20%, the best results reported on the TIMIT task to date, moving us closer to the human reported phonetic recognition error rate of 15%. We propose the use of this system as the baseline for future research and believe that it will serve as a good framework to explore ideas that will carry over to LVCSR systems.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

摘要

虽然大词汇量连续语音识别(LVCSR)的研究引发了许多最新研究思路的发展，但该领域的研究存在两个主要缺陷。首先，由于大量的参数和标记不佳的转录，基于错误分析获得进一步改进的洞察力是非常困难的。其次，与小词汇任务相比，LVCSR系统通常需要更长的时间来训练和测试新的研究想法。像TIMIT这样的小型词汇任务提供了一个语音丰富且手工标记的语料库，并提供了一个很好的测试平台来研究算法改进。然而，通常为小词汇任务探索的研究思路并不总是为LVCSR系统提供收益。在本文中，我们通过将典型LVCSR系统中使用的标准Ȝrecipeȝ应用于TIMIT语音识别语料库来解决这些问题，该语料库为比较方法提供了一个标准基准。我们发现，在扬声器无关(SI)水平上，我们的结果提供了与其他SI HMM系统相当的性能。通过利用LVCSR系统中常用的说话人自适应和判别训练技术，我们实现了20%的错误率，这是迄今为止在TIMIT任务中报告的最佳结果，使我们更接近人类报告的语音识别错误率15%。我们建议使用该系统作为未来研究的基线，并相信它将作为一个很好的框架来探索将延续到LVCSR系统的想法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An exploration of large vocabulary tools for small vocabulary phonetic recognition

While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers from two main drawbacks. First, because of the large number of parameters and poorly labeled transcriptions, gaining insight into further improvements based on error analysis is very difficult. Second, LVCSR systems often take a significantly longer time to train and test new research ideas compared to small vocabulary tasks. A small vocabulary task like TIMIT provides a phonetically rich and hand-labeled corpus and offers a good test bed to study algorithmic improvements. However, oftentimes research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we address these issues by taking the standard Ȝrecipeȝ used in typical LVCSR systems and applying it to the TIMIT phonetic recognition corpus, which provides a standard benchmark to compare methods. We find that at the speaker-independent (SI) level, our results offer comparable performance to other SI HMM systems. By taking advantage of speaker adaptation and discriminative training techniques commonly used in LVCSR systems, we achieve an error rate of 20%, the best results reported on the TIMIT task to date, moving us closer to the human reported phonetic recognition error rate of 15%. We propose the use of this system as the baseline for future research and believe that it will serve as a good framework to explore ideas that will carry over to LVCSR systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量