Biomedical association mining and validation

Q2 Medicine

In Silico Biology Pub Date : 2010-02-15 DOI:10.1145/1722024.1722035

P. Gandra, M. Pradhan, M. Palakal

{"title":"Biomedical association mining and validation","authors":"P. Gandra, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722035","DOIUrl":null,"url":null,"abstract":"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"9"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722035","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 2

Abstract

During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.

查看原文本刊更多论文

生物医学关联挖掘与验证

在过去十年中，生物医学文献中发表的数据呈指数级增长。随着这种增长，手动阅读所有论文以获取所需信息变得很困难。已经开发了许多文本挖掘算法和方法来从现有文献中提取信息。其中一个重要的信息是发现功能术语之间的联系，如基因、蛋白质、药物、疾病等。这些联系可以是随意的、明确的或隐含的。最常见的应用之一是从Pubmed中挖掘蛋白质之间的相互作用。目前研究的重点是识别和验证隐性蛋白质-蛋白质关联，因为这些很难从文献中识别。当自动检测到这些关联时，它们是嘈杂的，需要验证其生物学意义。在验证过程中，这些关联通过一系列过滤器和算法来去除数据中存在的噪声。在这项研究中，我们使用16个基因id从Pubmed数据库中检索到与再生生物学相关的32,693篇文献，193,738个句子。从这些句子中，BioMap发现了10004显式和30,000隐式蛋白质相互作用对，使用所提出的方法进行了验证。最后确定了308个隐式对作为该方法的结果。这些结果表明，所提出的方法可以有效地用于通过文献挖掘获得的隐式蛋白质-蛋白质相互作用的生物学验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

In Silico Biology Computer Science-Computational Theory and Mathematics

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.