随机语音理解研究

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034637

H. Bonneau-Maynard, F. Lefèvre

{"title":"随机语音理解研究","authors":"H. Bonneau-Maynard, F. Lefèvre","doi":"10.1109/ASRU.2001.1034637","DOIUrl":null,"url":null,"abstract":"The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Investigating stochastic speech understanding\",\"authors\":\"H. Bonneau-Maynard, F. Lefèvre\",\"doi\":\"10.1109/ASRU.2001.1034637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.\",\"PeriodicalId\":118671,\"journal\":{\"name\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2001.1034637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

在语音理解系统的开发中，对人类专业知识的需求可以通过使用随机技术大大减少。然而，基于语料库的技术需要对大量的训练数据进行标注。这种语料库的手动语义注释是乏味的、昂贵的，并且容易出现不一致。本文研究了训练语料库大小对理解模块性能的影响。本文还研究了自动标注数据作为一种以极低成本增加语料库大小的方法。首先，提出了一个随机语音理解模型，该模型是利用LIMSI ARISE对话系统收集的数据开发的。它的性能可以与系统中目前使用的基于规则的caseframe语法相媲美。在第二步中，寻求两种降低开发成本的方法:(1)减少用于训练随机模型的手动注释数据的数量;(2)在训练过程中使用自动注释数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigating stochastic speech understanding

The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量