语言模型分类器的自举训练方法

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-770

V. Warnke, E. Nöth, J. Buckow, S. Harbeck, H. Niemann

{"title":"语言模型分类器的自举训练方法","authors":"V. Warnke, E. Nöth, J. Buckow, S. Harbeck, H. Niemann","doi":"10.21437/ICSLP.1998-770","DOIUrl":null,"url":null,"abstract":"In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A bootstrap training approach for language model classifiers\",\"authors\":\"V. Warnke, E. Nöth, J. Buckow, S. Harbeck, H. Niemann\",\"doi\":\"10.21437/ICSLP.1998-770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID.\",\"PeriodicalId\":117113,\"journal\":{\"name\":\"5th International Conference on Spoken Language Processing (ICSLP 1998)\",\"volume\":\"483 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"5th International Conference on Spoken Language Processing (ICSLP 1998)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/ICSLP.1998-770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种语言模型(LM)分类器的自举训练方法。训练依赖于类的LM并并行运行它们，LM可以作为任何类型符号序列的分类器，例如，单词或音素序列，用于主题发现或语言识别(LID)等任务。无论LM分类器使用的特殊符号序列如何，LM的训练都是通过人工标记的训练集来完成的，这些训练集来自不一定是合作的说话者。因此，我们不得不面对一些错误的标签和偏离最初预期的类规范。这两个事实都会使分类恶化。因此，最好不要使用所有的话语进行训练，而是自动选择那些提高识别准确性的话语;这可以通过引导过程来完成。我们展示了我们在verb语料库上的最佳方法在对话行为分类和LID任务上取得的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A bootstrap training approach for language model classifiers

In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

自引率

0.00%

发文量