语音识别的最大熵建模

2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI:10.1109/CHINSL.2004.1409569

H. Kuo

{"title":"语音识别的最大熵建模","authors":"H. Kuo","doi":"10.1109/CHINSL.2004.1409569","DOIUrl":null,"url":null,"abstract":"Summary form only given. Maximum entropy (maxent) models have become very popular in natural language processing. We begin with a basic introduction of the maximum entropy principle, cover the popular algorithms for training maxent models, and describe how maxent models have been used in language modeling and (more recently) acoustic modeling for speech recognition. Some comparisons with other discriminative modeling methods is made. A substantial amount of time is devoted to the details of a new framework for acoustic modeling using maximum entropy direct models, including practical issues of implementation and usage. Traditional statistical models for speech recognition have all been based on a Bayesian framework using generative models such as hidden Markov models (HMM). The new framework is based on maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMM, features can be asynchronous and overlapping, and need not be statistically independent. This model therefore allows for the potential combination of many different types of features. Results from a specific kind of direct model, the maximum entropy Markov model (MEMM) are presented. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMM in word error rate when used as stand-alone acoustic models. Combining the MEMM scores with HMM and language model scores shows modest improvements over the best HMM speech recognizer. We give a sense of some exciting possibilities for future research in using maximum entropy models for acoustic modeling.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Maximum entropy modeling for speech recognition\",\"authors\":\"H. Kuo\",\"doi\":\"10.1109/CHINSL.2004.1409569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Maximum entropy (maxent) models have become very popular in natural language processing. We begin with a basic introduction of the maximum entropy principle, cover the popular algorithms for training maxent models, and describe how maxent models have been used in language modeling and (more recently) acoustic modeling for speech recognition. Some comparisons with other discriminative modeling methods is made. A substantial amount of time is devoted to the details of a new framework for acoustic modeling using maximum entropy direct models, including practical issues of implementation and usage. Traditional statistical models for speech recognition have all been based on a Bayesian framework using generative models such as hidden Markov models (HMM). The new framework is based on maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMM, features can be asynchronous and overlapping, and need not be statistically independent. This model therefore allows for the potential combination of many different types of features. Results from a specific kind of direct model, the maximum entropy Markov model (MEMM) are presented. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMM in word error rate when used as stand-alone acoustic models. Combining the MEMM scores with HMM and language model scores shows modest improvements over the best HMM speech recognizer. We give a sense of some exciting possibilities for future research in using maximum entropy models for acoustic modeling.\",\"PeriodicalId\":212562,\"journal\":{\"name\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2004.1409569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

只提供摘要形式。最大熵(maxent)模型在自然语言处理中非常流行。我们从最大熵原理的基本介绍开始，涵盖了训练最大熵模型的流行算法，并描述了最大熵模型如何用于语言建模和(最近的)语音识别的声学建模。并与其他判别建模方法进行了比较。大量的时间致力于使用最大熵直接模型的声学建模新框架的细节，包括实现和使用的实际问题。传统的语音识别统计模型都是基于贝叶斯框架，使用隐马尔可夫模型(HMM)等生成模型。新框架基于最大熵直接建模，其中给定观察序列的状态或单词序列的概率直接从模型中计算。与HMM相反，特征可以是异步的和重叠的，并且不需要在统计上独立。因此，该模型允许许多不同类型的功能的潜在组合。给出了一种特殊的直接模型——最大熵马尔可夫模型(MEMM)。即使使用传统的声学特征，这种方法已经显示出电话级解码的有希望的结果。当MEMM作为独立声学模型使用时，其单词错误率明显优于传统HMM。将MEMM分数与HMM和语言模型分数相结合，显示出比最佳HMM语音识别器有适度的改进。我们给出了使用最大熵模型进行声学建模的未来研究的一些令人兴奋的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Maximum entropy modeling for speech recognition

Summary form only given. Maximum entropy (maxent) models have become very popular in natural language processing. We begin with a basic introduction of the maximum entropy principle, cover the popular algorithms for training maxent models, and describe how maxent models have been used in language modeling and (more recently) acoustic modeling for speech recognition. Some comparisons with other discriminative modeling methods is made. A substantial amount of time is devoted to the details of a new framework for acoustic modeling using maximum entropy direct models, including practical issues of implementation and usage. Traditional statistical models for speech recognition have all been based on a Bayesian framework using generative models such as hidden Markov models (HMM). The new framework is based on maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMM, features can be asynchronous and overlapping, and need not be statistically independent. This model therefore allows for the potential combination of many different types of features. Results from a specific kind of direct model, the maximum entropy Markov model (MEMM) are presented. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMM in word error rate when used as stand-alone acoustic models. Combining the MEMM scores with HMM and language model scores shows modest improvements over the best HMM speech recognizer. We give a sense of some exciting possibilities for future research in using maximum entropy models for acoustic modeling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2004 International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量