{"title":"Identifying the Truth: Aggregation of Named Entity Extraction Results","authors":"Katja Pfeifer, J. Meinecke","doi":"10.1145/2539150.2539160","DOIUrl":null,"url":null,"abstract":"Huge amounts of textual information relevant for market analysis, trending or product monitoring can be found on the Web. To exploit that knowledge a number of extraction services were proposed that extract and categorize entities from given text. Prior work showed that a combination of individual extractors can increase quality. However, so far no system exists that is fully applicable to reasonably combine real world extraction services that differ substantially in the entity types they extract and the schemata used. In this paper, we propose an aggregation system and a corresponding aggregation process that can be used for these services. We present a number of novel aggregation techniques that incorporate schema-information as well as entity extraction specific characteristics into the aggregation process. The aggregation system is broadly evaluated on six real world named entity recognition services and compared to state of the art approaches.","PeriodicalId":424918,"journal":{"name":"International Conference on Information Integration and Web-based Applications & Services","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2539150.2539160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Huge amounts of textual information relevant for market analysis, trending or product monitoring can be found on the Web. To exploit that knowledge a number of extraction services were proposed that extract and categorize entities from given text. Prior work showed that a combination of individual extractors can increase quality. However, so far no system exists that is fully applicable to reasonably combine real world extraction services that differ substantially in the entity types they extract and the schemata used. In this paper, we propose an aggregation system and a corresponding aggregation process that can be used for these services. We present a number of novel aggregation techniques that incorporate schema-information as well as entity extraction specific characteristics into the aggregation process. The aggregation system is broadly evaluated on six real world named entity recognition services and compared to state of the art approaches.