Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags

NER@ACL Pub Date : 2003-07-12 DOI:10.3115/1119384.1119387
T. Kumano, H. Kashioka, Hideki Tanaka, T. Fukusima
{"title":"Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags","authors":"T. Kumano, H. Kashioka, Hideki Tanaka, T. Fukusima","doi":"10.3115/1119384.1119387","DOIUrl":null,"url":null,"abstract":"We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a Japanese-English broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-English document pairs. Analysis of about 1,100 annotated article pairs has shown that if NE occurrence information, such as classes, number of occurrence and occurrence order, is given for each language, it may provide a good clue for corresponding NEs across languages.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NER@ACL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1119384.1119387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a Japanese-English broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-English document pairs. Analysis of about 1,100 annotated article pairs has shown that if NE occurrence information, such as classes, number of occurrence and occurrence order, is given for each language, it may provide a good clue for corresponding NEs across languages.
带命名实体标签的日英广播新闻语料库的构建与分析
我们的目标是从非平行的、内容对齐的语料库中获取命名实体(NE)翻译知识,利用NE提取技术。在本研究中,我们正在构建一个带有NE标签的日英广播新闻语料库。标签不仅表示NE类信息,还表示同一单语文档内和对应的日英文档对之间的共引用信息。通过对约1100对标注文章的分析表明,如果给出每种语言的网元出现信息,如出现的类别、出现的次数和出现的顺序,可能会为跨语言对应的网元提供很好的线索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信