Improved named entity translation and bilingual named entity extraction

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI:10.1109/ICMI.2002.1167002

Fei Huang, S. Vogel

引用次数: 52

Abstract

Translation of named entities (NE), including proper names, temporal and numerical expressions, is very important in multilingual natural language processing, like crosslingual information retrieval and statistical machine translation. We present an integrated approach to extract a named entity translation dictionary from a bilingual corpus while at the same time improving the named entity annotation quality. Starting from a bilingual corpus where the named entities are extracted independently for each language, a statistical alignment model is used to align the named entities. An iterative process is applied to extract named entity pairs with higher alignment probability. This leads to a smaller but cleaner named entity translation dictionary and also to a significant improvement of the monolingual named entity annotation quality for both languages. Experimental result shows that the dictionary size is reduced by 51.8% and the annotation quality is improved from 70.03 to 78.15 for Chinese and 73.38 to 81.46 in terms of F-score.

查看原文本刊更多论文

改进了命名实体翻译和双语命名实体提取

命名实体(NE)的翻译，包括专有名称、时间表达式和数值表达式，在跨语言信息检索和统计机器翻译等多语言自然语言处理中非常重要。提出了一种从双语语料库中提取命名实体翻译词典的集成方法，同时提高了命名实体标注的质量。从双语语料库开始，其中为每种语言独立提取命名实体，使用统计对齐模型来对齐命名实体。采用迭代过程提取具有较高对齐概率的命名实体对。这将产生一个更小但更简洁的命名实体翻译字典，并显著提高两种语言的单语言命名实体注释质量。实验结果表明，该方法减少了51.8%的字典大小，中文标注质量从70.03提高到78.15,f分数从73.38提高到81.46。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

自引率

0.00%

发文量