Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation

Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan
{"title":"Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation","authors":"Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan","doi":"10.1109/ASRU.2011.6163967","DOIUrl":null,"url":null,"abstract":"Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F1 score on a field test set using speech recognizer output.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F1 score on a field test set using speech recognizer output.
通过基于字符的标注和先验知识开发,对汉语口语有较强的理解
鲁棒性是口语理解(SLU)中最具挑战性的问题之一。本文研究了面向语音搜索对话系统的汉语口语语义理解问题。我们首先将语义理解问题简化为命名实体识别(NER)任务,该任务进一步表述为顺序标注。我们进行了选择字符而不是单词作为标注单位的实验。然后提出了两种利用领域词典形式的先验知识到基于字符的标注框架中的方法。其中一个通过将更正式的词汇特征与领域词汇相结合来丰富标注器功能。另一种方法通过简单地将域实体添加到训练数据中来明确地使用它们。实验结果表明,两种方法都是有效的。通过结合上述两种互补的方法,可以实现最佳性能。通过利用先验知识,我们在使用语音识别器输出的现场测试集上将NER的F1分数从75.27提高到90.24。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信