{"title":"Entity refinement using latent semantic indexing","authors":"R. Bradford","doi":"10.1109/ISI.2010.5484765","DOIUrl":null,"url":null,"abstract":"Automated extraction of named entities is an important text analysis task. In addition to recognizing the occurrence of entity names, it is important to be able to label those names by type. Most entity extraction techniques categorize extracted entities into a few basic types, such as PERSON, ORGANIZATION, and LOCATION. This paper presents an approach for generating more fine-grained subdivisions of entity type. The technique of latent semantic indexing (LSI) is used to provide semantic context as an indicator of likely entity subtype. Tests were carried out on a collection of 5.5 million English-language news articles. At modest levels of recall, the accuracy of sub-type assignment was comparable to the accuracy with which the gross type was assigned by a state-of-the-art commercial entity extraction software package.","PeriodicalId":434501,"journal":{"name":"2010 IEEE International Conference on Intelligence and Security Informatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligence and Security Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2010.5484765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automated extraction of named entities is an important text analysis task. In addition to recognizing the occurrence of entity names, it is important to be able to label those names by type. Most entity extraction techniques categorize extracted entities into a few basic types, such as PERSON, ORGANIZATION, and LOCATION. This paper presents an approach for generating more fine-grained subdivisions of entity type. The technique of latent semantic indexing (LSI) is used to provide semantic context as an indicator of likely entity subtype. Tests were carried out on a collection of 5.5 million English-language news articles. At modest levels of recall, the accuracy of sub-type assignment was comparable to the accuracy with which the gross type was assigned by a state-of-the-art commercial entity extraction software package.