Shiva Kumar, Vijay H. Ghadage, I. Subramanian, A. Desai, Vivek Singh, A. Jere
{"title":"BioGyan: A Tool to Identify Gene Functions from Literature","authors":"Shiva Kumar, Vijay H. Ghadage, I. Subramanian, A. Desai, Vivek Singh, A. Jere","doi":"10.4172/2153-0602.1000164","DOIUrl":null,"url":null,"abstract":"Background: The primary objective of life science research is to understand complex cellular mechanisms and the interplay of various genes/proteins in multiple cellular processes. For this, PubMed is still the primary source of biomedical information even though multiple other databases such as UniProt, Protein Data Bank (PDB) and Reactome exist. Objective: With the available large volume data from high-throughput technologies and multiple databases, finding relevant information for gene-process-phenotype has now become extremely challenging and tedious. No tool is currently available to simultaneously search PubMed and multiple other databases to get holistic information. Moreover, a typical PubMed search returns large number of articles, which need to be manually screened for identifying relevant literature. Hence, we developed BioGyan, a literature mining tool to simplify the combinatorial search for genes, celltypes and cellular processes in PubMed and other relevant databases. Methods: BioGyan uses a robust scoring method to rank articles relevant to user search terms. The scoring method is based on the weighted sum of co-occurrence of gene, process and interactions terms in an abstract. Results: BioGyan retrieves PubMed articles supporting association between queried genes and processes, relevant pathways from pathway databases and 3-dimensional structures from PDB. For easy viewing, all information to the user is available in single window. BioGyan showed an accuracy of 85.46% in predicting relevance of articles to a gene-process association, and performed better than PESCADOR. Conclusion: BioGyan has several key features such as batch query of genes as well as processes, offline reading of articles, export of list of articles as bibliography and flexibility for user to revise the article relevance, making it a vital tool for literature search. Thus, BioGyan is a unique tool that offers holistic search across multiple databases while greatly automating the entire process.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"149 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data Mining in Genomics & Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2153-0602.1000164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The primary objective of life science research is to understand complex cellular mechanisms and the interplay of various genes/proteins in multiple cellular processes. For this, PubMed is still the primary source of biomedical information even though multiple other databases such as UniProt, Protein Data Bank (PDB) and Reactome exist. Objective: With the available large volume data from high-throughput technologies and multiple databases, finding relevant information for gene-process-phenotype has now become extremely challenging and tedious. No tool is currently available to simultaneously search PubMed and multiple other databases to get holistic information. Moreover, a typical PubMed search returns large number of articles, which need to be manually screened for identifying relevant literature. Hence, we developed BioGyan, a literature mining tool to simplify the combinatorial search for genes, celltypes and cellular processes in PubMed and other relevant databases. Methods: BioGyan uses a robust scoring method to rank articles relevant to user search terms. The scoring method is based on the weighted sum of co-occurrence of gene, process and interactions terms in an abstract. Results: BioGyan retrieves PubMed articles supporting association between queried genes and processes, relevant pathways from pathway databases and 3-dimensional structures from PDB. For easy viewing, all information to the user is available in single window. BioGyan showed an accuracy of 85.46% in predicting relevance of articles to a gene-process association, and performed better than PESCADOR. Conclusion: BioGyan has several key features such as batch query of genes as well as processes, offline reading of articles, export of list of articles as bibliography and flexibility for user to revise the article relevance, making it a vital tool for literature search. Thus, BioGyan is a unique tool that offers holistic search across multiple databases while greatly automating the entire process.
背景:生命科学研究的主要目的是了解复杂的细胞机制和多种基因/蛋白质在多种细胞过程中的相互作用。因此,PubMed仍然是生物医学信息的主要来源,即使存在多个其他数据库,如UniProt, Protein Data Bank (PDB)和Reactome。目的:随着高通量技术和多个数据库的大量数据,寻找基因-过程-表型的相关信息变得非常具有挑战性和繁琐。目前还没有工具可以同时搜索PubMed和多个其他数据库以获得整体信息。此外,典型的PubMed搜索返回大量文章,需要手动筛选以识别相关文献。因此,我们开发了BioGyan,这是一个文献挖掘工具,可以简化PubMed和其他相关数据库中基因、细胞类型和细胞过程的组合搜索。方法:BioGyan使用稳健的评分方法对与用户搜索词相关的文章进行排名。该评分方法基于摘要中基因、过程和交互项共现的加权和。结果:BioGyan检索PubMed文章支持查询的基因和过程之间的关联,从途径数据库中检索相关途径,从PDB中检索三维结构。为了方便查看,用户的所有信息都在一个窗口中提供。BioGyan预测文章与基因过程关联相关性的准确率为85.46%,优于PESCADOR。结论:BioGyan具有批量查询基因和流程、文章离线阅读、文章目录输出、用户灵活修改文章相关性等关键功能,是文献检索的重要工具。因此,BioGyan是一个独特的工具,它提供跨多个数据库的整体搜索,同时大大自动化了整个过程。