Czech Named Entity Corpus and SVM-based Recognizer

NEWS@IJCNLP Pub Date : 2009-08-07 DOI:10.3115/1699705.1699748

Jana Kravalova, Z. Žabokrtský

引用次数: 44

Abstract

This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.

查看原文本刊更多论文

捷克语命名实体语料库和基于svm的识别器

本文研究了捷克语文本中命名实体的识别问题。我们提出了一个最近发布的捷克语句子语料库，其中有手动注释的命名实体，其中使用了丰富的两级分类方案。语料库中大约有6000个句子，大约有33000个标记的命名实体实例。我们使用这些数据对基于支持向量机分类技术的命名实体识别器进行训练和评估。所提出的识别器优于先前报道的捷克语NE识别结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NEWS@IJCNLP

自引率

0.00%

发文量