Entity refinement using latent semantic indexing

2010 IEEE International Conference on Intelligence and Security Informatics Pub Date : 2010-05-23 DOI:10.1109/ISI.2010.5484765

R. Bradford

引用次数: 0

Abstract

Automated extraction of named entities is an important text analysis task. In addition to recognizing the occurrence of entity names, it is important to be able to label those names by type. Most entity extraction techniques categorize extracted entities into a few basic types, such as PERSON, ORGANIZATION, and LOCATION. This paper presents an approach for generating more fine-grained subdivisions of entity type. The technique of latent semantic indexing (LSI) is used to provide semantic context as an indicator of likely entity subtype. Tests were carried out on a collection of 5.5 million English-language news articles. At modest levels of recall, the accuracy of sub-type assignment was comparable to the accuracy with which the gross type was assigned by a state-of-the-art commercial entity extraction software package.

查看原文本刊更多论文

使用潜在语义索引进行实体细化

命名实体的自动提取是一项重要的文本分析任务。除了识别实体名称的出现之外，能够按类型标记这些名称也很重要。大多数实体提取技术将提取的实体分为几种基本类型，如PERSON、ORGANIZATION和LOCATION。本文提出了一种生成更细粒度的实体类型细分的方法。使用潜在语义索引技术提供语义上下文作为可能实体子类型的指示符。测试对象是550万篇英语新闻文章。在适度的召回水平上，子类型分配的准确性与由最先进的商业实体提取软件包分配的总类型的准确性相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Conference on Intelligence and Security Informatics

自引率

0.00%

发文量