ChEBI的化学实体识别与解析。

ISRN bioinformatics Pub Date : 2012-02-15 eCollection Date: 2012-01-01 DOI:10.5402/2012/619427
Tiago Grego, Catia Pesquita, Hugo P Bastos, Francisco M Couto
{"title":"ChEBI的化学实体识别与解析。","authors":"Tiago Grego,&nbsp;Catia Pesquita,&nbsp;Hugo P Bastos,&nbsp;Francisco M Couto","doi":"10.5402/2012/619427","DOIUrl":null,"url":null,"abstract":"<p><p>Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"619427"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393067/pdf/","citationCount":"28","resultStr":"{\"title\":\"Chemical Entity Recognition and Resolution to ChEBI.\",\"authors\":\"Tiago Grego,&nbsp;Catia Pesquita,&nbsp;Hugo P Bastos,&nbsp;Francisco M Couto\",\"doi\":\"10.5402/2012/619427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. </p>\",\"PeriodicalId\":90877,\"journal\":{\"name\":\"ISRN bioinformatics\",\"volume\":\"2012 \",\"pages\":\"619427\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393067/pdf/\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISRN bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5402/2012/619427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2012/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISRN bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5402/2012/619427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

摘要

化学实体在生物医学文献中无处不在,需要开发能够有效识别这些实体的文本挖掘系统。由于缺乏可用的语料库和数据资源,社区一直致力于开发基因和蛋白质命名实体识别系统,但随着ChEBI的发布和注释语料库的可用性,这一任务可以得到解决。我们开发了一种基于机器学习的化学实体识别方法和一种基于词汇相似度的化学实体解析方法,并将它们与Whatizit(一种流行的基于词典的方法)进行了比较。我们的方法在所有任务中都优于基于字典的方法,实体识别任务的F-measure提高了20%,实体解析任务的F-measure提高了2-5%,实体识别和解析组合任务的F-measure提高了15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Chemical Entity Recognition and Resolution to ChEBI.

Chemical Entity Recognition and Resolution to ChEBI.

Chemical Entity Recognition and Resolution to ChEBI.

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信