Identifying the Truth: Aggregation of Named Entity Extraction Results

International Conference on Information Integration and Web-based Applications & Services Pub Date : 2013-12-02 DOI:10.1145/2539150.2539160

Katja Pfeifer, J. Meinecke

引用次数: 2

Abstract

Huge amounts of textual information relevant for market analysis, trending or product monitoring can be found on the Web. To exploit that knowledge a number of extraction services were proposed that extract and categorize entities from given text. Prior work showed that a combination of individual extractors can increase quality. However, so far no system exists that is fully applicable to reasonably combine real world extraction services that differ substantially in the entity types they extract and the schemata used. In this paper, we propose an aggregation system and a corresponding aggregation process that can be used for these services. We present a number of novel aggregation techniques that incorporate schema-information as well as entity extraction specific characteristics into the aggregation process. The aggregation system is broadly evaluated on six real world named entity recognition services and compared to state of the art approaches.

查看原文本刊更多论文

识别真相:命名实体提取结果的聚合

在网络上可以找到大量与市场分析、趋势或产品监控相关的文本信息。为了利用这些知识，提出了一些从给定文本中提取实体并对其进行分类的提取服务。先前的工作表明，单个提取器的组合可以提高质量。然而，到目前为止，还没有一个系统可以完全适用于合理地组合现实世界中的提取服务，这些服务在提取的实体类型和使用的模式上存在很大差异。在本文中，我们提出了一个可以用于这些服务的聚合系统和相应的聚合过程。我们提出了一些新的聚合技术，这些技术将模式信息和实体提取的特定特征结合到聚合过程中。该聚合系统在六个真实世界的命名实体识别服务上进行了广泛的评估，并与最先进的方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Information Integration and Web-based Applications & Services

自引率

0.00%

发文量