In Silico Biology最新文献

筛选
英文 中文
Combined classifier for unknown genome classification using chaos game representation features 基于混沌博弈表示特征的未知基因组分类器
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722065
Vrinda V. Nair, A. Nair
{"title":"Combined classifier for unknown genome classification using chaos game representation features","authors":"Vrinda V. Nair, A. Nair","doi":"10.1145/1722024.1722065","DOIUrl":"https://doi.org/10.1145/1722024.1722065","url":null,"abstract":"Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters 微卫星探测软件的对比分析:结果的显著差异和参数的影响
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722068
Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram
{"title":"Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters","authors":"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram","doi":"10.1145/1722024.1722068","DOIUrl":"https://doi.org/10.1145/1722024.1722068","url":null,"abstract":"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Predicting protein-protein interactions using first principle methods and statistical scoring 使用第一性原理方法和统计评分预测蛋白质相互作用
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722038
M. Pradhan, P. Gandra, M. Palakal
{"title":"Predicting protein-protein interactions using first principle methods and statistical scoring","authors":"M. Pradhan, P. Gandra, M. Palakal","doi":"10.1145/1722024.1722038","DOIUrl":"https://doi.org/10.1145/1722024.1722038","url":null,"abstract":"Proteins are a combination of different PDB structures. To understand the interactions of the proteins, we have proposed a methodology that integrates the first principle parameters for protein interaction along with the number of PDB structures defining these proteins. Annotating possibly interacting proteins pairs with their Pfam and GO domains increases the strength of each interaction and can identify the important link between the two proteins. We propose a novel technique to predict protein interactions by integrating a protein's physico-chemical properties and the number of PDB structures that uses sliding window algorithm to compute the optimal interacting score. The proposed method identified ~94% true prediction from a known set of interacting protein dataset and a 100% prediction for non-interacting dataset. The prediction model that was developed was applied to an unknown protein dataset and we identified a novel interacting protein pairs with high relevance.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying the nature of the interface in protein-protein complexes 鉴定蛋白质-蛋白质复合物界面的性质
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722040
Pralay Mitra
{"title":"Identifying the nature of the interface in protein-protein complexes","authors":"Pralay Mitra","doi":"10.1145/1722024.1722040","DOIUrl":"https://doi.org/10.1145/1722024.1722040","url":null,"abstract":"The role of molecular recognition is critical to the proper self-assembly of biological macromolecules and their function. Shape complementarity of the mutual recognition interfaces is one of the important factors that guide this interaction. The lock-and-key mechanism involving enzyme-substrate is a classical hallmark of shape complementarities at work in biochemical reaction. Recognition principles between macromolecular entities, however, has been difficult formulate. Sensitive surface complementarity recognition algorithms are computationally prohibitive, while accuracy of the heuristic methods is limited by the choice of proper biochemical information. This is a major drawback in understanding macromolecular recognition which entails critical assessment of biochemical information involving large interacting interfaces. Here we data mine on a number of biochemical parameters to highlight their individual merits and demerits and propose specific properties suitable for designing heuristic algorithms. The work is expected to find utility within bioinformatics algorithms seeking docking macromolecules and designing of protein complex interfaces.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722040","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DNA barcoding and microsatellite marker development for Nyctibatrachus major: the threatened amphibian species 两栖濒危物种大鲵DNA条形码及微卫星标记的开发
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722029
K. Meenakshi, R. Remya, G. Sanil
{"title":"DNA barcoding and microsatellite marker development for Nyctibatrachus major: the threatened amphibian species","authors":"K. Meenakshi, R. Remya, G. Sanil","doi":"10.1145/1722024.1722029","DOIUrl":"https://doi.org/10.1145/1722024.1722029","url":null,"abstract":"Identifying species of organisms by using molecular and bioinformatics tools has been in the center of ongoing discussions on the conservation genetics field. The resolution of taxonomic uncertainties is a necessary step to distinguish entities for conservation purposes. In an effort to contribute to resolving this taxonomic uncertainty and to assess the genetic population structure of the taxon Nyctibatrachus major, we barcoded COI gene for species identification and also developed species specific primers for microsatellite markers to assess the population dynamics among Nyctibatrachus major population. The current work is a part of the ongoing programs on the conservation genetics of endemic fauna of Western Ghats.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Random walk ranking guided by disease association networks for lung cancer biomarker discovery 基于疾病关联网络的肺癌生物标志物发现随机行走排序
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722062
T. Huan, Xiaogang Wu, Zengliang Bai, J. Chen
{"title":"Random walk ranking guided by disease association networks for lung cancer biomarker discovery","authors":"T. Huan, Xiaogang Wu, Zengliang Bai, J. Chen","doi":"10.1145/1722024.1722062","DOIUrl":"https://doi.org/10.1145/1722024.1722062","url":null,"abstract":"The identification of candidate molecular entities involved in a specific disease has been a primary focus of cancer study on biomarker discovery. Prioritizing proteins from a disease-specific protein-protein interaction (PPI) network has become an efficient computational strategy for cancer biomarker discovery. Although some successful methods, such as random walk ranking (RWR) algorithm, can exploit global network topology to prioritize proteins, this network-based computational strategy still needs more comprehensive prior knowledge, like genome-wide association study (GWAS), to improve its discovering capability.\u0000 In this paper, we first analyzed genome-wide association loci for human diseases, and built disease association networks (DAN), whose associations were defined by two diseases sharing common genetic variants. Then we assigned each node in a human PPI network a disease-specific weight, based on knowledge from the DANs and text mining. Finally, we presented a seed-weighted random walk ranking (SW-RWR) method to prioritize biomarkers in the global human PPI network. We used a lung cancer case study to show that our ranking strategy has better accuracy and sensitivity in discovering potential clinically-useful; biomarkers than a similar network-based ranking method. This result suggests that close association among different diseases could play an important role in biomarker discovery.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Gibbs sampling algorithm for motif discovery using a linear mixed model 基于线性混合模型的基序发现Gibbs采样算法
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722053
Daming Lu
{"title":"A Gibbs sampling algorithm for motif discovery using a linear mixed model","authors":"Daming Lu","doi":"10.1145/1722024.1722053","DOIUrl":"https://doi.org/10.1145/1722024.1722053","url":null,"abstract":"The identification of motifs in the gene promoters is a critical step in the delineation of the genetic regulatory framework of an organism. In this paper, a new linear mixed model is introduced. This model is a combination of the conventional Position Weight Matrix (PWM) model and a novel Mutual Information (MI) model. PWM can contain individual position frequencies whereas MI can reflect pair wise relation between positions. A training stage is carried out to determine the weight of each model. After that this trained model is embedded into a Gibbs sampling algorithm for motif discovery. After analyzing a set of DNA sequences using this program, putative motifs are gained and compared with experimental verified motifs as well as other popular motif finding software. Results show that this new mixed model can improve motif discovery accuracy to some extent.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algebraic approach to optimal clone selection applied in metagenomic projects 元基因组项目中最优克隆选择的代数方法
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722066
M. Cantão, L. V. de Araújo, E. G. Lemos, J. E. Ferreira
{"title":"Algebraic approach to optimal clone selection applied in metagenomic projects","authors":"M. Cantão, L. V. de Araújo, E. G. Lemos, J. E. Ferreira","doi":"10.1145/1722024.1722066","DOIUrl":"https://doi.org/10.1145/1722024.1722066","url":null,"abstract":"Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Gene regulatory network from microarray data using dynamic neural fuzzy approach 基因调控网络从微阵列数据采用动态神经模糊方法
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722044
S. Vineetha, C. Chandra Shekara Bhat, S. M. Idicula
{"title":"Gene regulatory network from microarray data using dynamic neural fuzzy approach","authors":"S. Vineetha, C. Chandra Shekara Bhat, S. M. Idicula","doi":"10.1145/1722024.1722044","DOIUrl":"https://doi.org/10.1145/1722024.1722044","url":null,"abstract":"The paper presents a multilayered dynamic neural fuzzy network (DNFN) to extract regulatory relationship among genes and reconstruct gene regulatory network for circulating plasma RNA data from colon cancer patients. This method combines the merits of connectionist and fuzzy approaches. It encodes the knowledge learned in the form of fuzzy rules and processes data following fuzzy reasoning principles. While the dynamic aspect of gene regulation was taken into account through the on-line learning of fuzzy rules, the structural learning together with the parameter learning form a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. One of the main advantages of DNFN is that there is no predetermination of hidden nodes, since it can find its optimal structure automatically and quickly. The inferred knowledge using the above network may provide biological insights that can be used to design and interpret further experiments.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving motif refinement using hybrid expectation maximization and random projection 利用混合期望最大化和随机投影改进基序优化
In Silico Biology Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722048
H. S. Shashidhara, Prince Joseph, K. Srinivasa
{"title":"Improving motif refinement using hybrid expectation maximization and random projection","authors":"H. S. Shashidhara, Prince Joseph, K. Srinivasa","doi":"10.1145/1722024.1722048","DOIUrl":"https://doi.org/10.1145/1722024.1722048","url":null,"abstract":"The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling are sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. A novel optimization framework searches the neighborhood regions of the initial alignments in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. The work aims at implementing the hybrid algorithm and enhancing it by trying different global methods and other techniques. Then aggregation methods rather than projection methods are tried.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信