{"title":"Represented indicator measurement and corpus distillation on focus species detection","authors":"Chih-Hsuan Wei, Hung-Yu kao","doi":"10.1109/BIBM.2010.5706647","DOIUrl":null,"url":null,"abstract":"In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.