A Study on Consistency Checking Method of Part-Of-Speech Tagging for Chinese Corpora

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2008-06-01 DOI:10.30019/IJCLCLP.200806.0002

Hu Zhang, Jia-heng Zheng

引用次数: 1

Abstract

Ensuring consistency of Part-Of-Speech (POS) tagging plays an important role in the construction of high-quality Chinese corpora. After having analyzed the POS tagging of multi-category words in large-scale corpora, we propose a novel classification-based consistency checking method of POS tagging in this paper. Our method builds a vector model of the context of multi-category words along with using the k-NN algorithm to classify context vectors constructed from POS tagging sequences and to judge their consistency. These methods are evaluated on our 1.5M-word corpus. The experimental results indicate that the proposed method is feasible and effective.

查看原文本刊更多论文

汉语语料库词性标注一致性检验方法研究

词性标注的一致性对于构建高质量的汉语语料库具有重要意义。本文在分析了大规模语料库中多类别词的词性标注问题的基础上，提出了一种基于分类的词性标注一致性检验方法。该方法建立了多类别词上下文的向量模型，并使用k-NN算法对由词性标注序列构建的上下文向量进行分类并判断其一致性。这些方法在我们的150万字语料库上进行了评估。实验结果表明了该方法的可行性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Comput. Linguistics Chin. Lang. Process.

自引率

0.00%

发文量