通过文本分析为生命科学生成语义

2011 IEEE Fifth International Conference on Semantic Computing Pub Date : 2011-09-18 DOI:10.1109/ICSC.2011.75

E. Buyko, U. Hahn

{"title":"通过文本分析为生命科学生成语义","authors":"E. Buyko, U. Hahn","doi":"10.1109/ICSC.2011.75","DOIUrl":null,"url":null,"abstract":"The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world's largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.","PeriodicalId":408382,"journal":{"name":"2011 IEEE Fifth International Conference on Semantic Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generating Semantics for the Life Sciences via Text Analytics\",\"authors\":\"E. Buyko, U. Hahn\",\"doi\":\"10.1109/ICSC.2011.75\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world's largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.\",\"PeriodicalId\":408382,\"journal\":{\"name\":\"2011 IEEE Fifth International Conference on Semantic Computing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Fifth International Conference on Semantic Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSC.2011.75\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Fifth International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2011.75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

生命科学对精心策划的、语义丰富的事实存储库有着强烈的需求。从非结构化文本源获取知识目前是由高技能的管理员执行的，他们通过对选择填充此类存储库的文档的深入理解，将语义手动输入此类数据库。由于这是一个缓慢而昂贵的过程，我们在这里提倡一种基于高性能关系提取系统JREX的数据库内容自动生成方法。作为一个现实生活中的例子，我们的目标是REGULONDB，这是世界上最大的大肠杆菌转录调控网络的人工管理参考数据库。在我们的研究中，我们调查了从各种文献来源(如PUBMED摘要和相关全文文章)中自动获取知识的性能。我们的结果表明，通过处理相关文献来源，我们确实可以自动重新创建REGULONDB数据库的相当一部分。因此，这种方法可以帮助策展人拓宽这一领域的知识获取瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating Semantics for the Life Sciences via Text Analytics

The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world's largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE Fifth International Conference on Semantic Computing

自引率

0.00%

发文量