Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163967

Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan

{"title":"Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation","authors":"Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan","doi":"10.1109/ASRU.2011.6163967","DOIUrl":null,"url":null,"abstract":"Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F1 score on a field test set using speech recognizer output.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F1 score on a field test set using speech recognizer output.

查看原文本刊更多论文

通过基于字符的标注和先验知识开发，对汉语口语有较强的理解

鲁棒性是口语理解(SLU)中最具挑战性的问题之一。本文研究了面向语音搜索对话系统的汉语口语语义理解问题。我们首先将语义理解问题简化为命名实体识别(NER)任务，该任务进一步表述为顺序标注。我们进行了选择字符而不是单词作为标注单位的实验。然后提出了两种利用领域词典形式的先验知识到基于字符的标注框架中的方法。其中一个通过将更正式的词汇特征与领域词汇相结合来丰富标注器功能。另一种方法通过简单地将域实体添加到训练数据中来明确地使用它们。实验结果表明，两种方法都是有效的。通过结合上述两种互补的方法，可以实现最佳性能。通过利用先验知识，我们在使用语音识别器输出的现场测试集上将NER的F1分数从75.27提高到90.24。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量