Czech Named Entity Corpus and SVM-based Recognizer

Jana Kravalova, Z. Žabokrtský
{"title":"Czech Named Entity Corpus and SVM-based Recognizer","authors":"Jana Kravalova, Z. Žabokrtský","doi":"10.3115/1699705.1699748","DOIUrl":null,"url":null,"abstract":"This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NEWS@IJCNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1699705.1699748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.
捷克语命名实体语料库和基于svm的识别器
本文研究了捷克语文本中命名实体的识别问题。我们提出了一个最近发布的捷克语句子语料库,其中有手动注释的命名实体,其中使用了丰富的两级分类方案。语料库中大约有6000个句子,大约有33000个标记的命名实体实例。我们使用这些数据对基于支持向量机分类技术的命名实体识别器进行训练和评估。所提出的识别器优于先前报道的捷克语NE识别结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信