Part-of-speech tagger based on maximum entropy model

2009 2nd IEEE International Conference on Computer Science and Information Technology Pub Date : 2009-09-11 DOI:10.1109/ICCSIT.2009.5234787

Heyan Huang, Xiao-fei Zhang

引用次数: 11

Abstract

The maximum entropy (ME) conditional models don't force to adhere to the independence assumption such as in Hidden Markov generative models, and thus the ME -based Part-of-Speech (POS) tagger can depend on arbitrary, non-independent features, which are benefit to the POS tagging, without accounting for the distribution of those dependencies. Since ME models are able to flexibly utilize a wide variety of features, the sparse problem of training data is efficiently solved. Experiments show that the POS tagging error rate is reduced by 54.25% in close test and 40.56% in open test over the Hidden-Markov-Model-based baseline, and synchronously an accuracy of 98.01% in close test and 95.56%in open test are obtained.

查看原文本刊更多论文

基于最大熵模型的词性标注器

最大熵(ME)条件模型不像隐马尔可夫生成模型那样强制遵守独立性假设，因此基于最大熵的词性标注器可以依赖于任意的、非独立的特征，这些特征有利于词性标注，而无需考虑这些依赖关系的分布。由于ME模型能够灵活地利用多种特征，有效地解决了训练数据的稀疏问题。实验表明，在隐马尔可夫模型基础上，封闭测试和开放测试的词性标注错误率分别降低了54.25%和40.56%，封闭测试和开放测试的准确率分别达到了98.01%和95.56%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 2nd IEEE International Conference on Computer Science and Information Technology

自引率

0.00%

发文量