ISRN bioinformaticsPub Date : 2014-05-04eCollection Date: 2014-01-01DOI: 10.1155/2014/901419
Giorgio Valentini
{"title":"Hierarchical ensemble methods for protein function prediction.","authors":"Giorgio Valentini","doi":"10.1155/2014/901419","DOIUrl":"10.1155/2014/901419","url":null,"abstract":"<p><p>Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware \"flat\" prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a \"consensus\" ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2014 ","pages":"901419"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2014-01-12eCollection Date: 2014-01-01DOI: 10.1155/2014/345106
Jonatan Taminau, Cosmin Lazar, Stijn Meganck, Ann Nowé
{"title":"Comparison of merging and meta-analysis as alternative approaches for integrative gene expression analysis.","authors":"Jonatan Taminau, Cosmin Lazar, Stijn Meganck, Ann Nowé","doi":"10.1155/2014/345106","DOIUrl":"https://doi.org/10.1155/2014/345106","url":null,"abstract":"<p><p>An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data-meta-analysis and data merging-are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2014 ","pages":"345106"},"PeriodicalIF":0.0,"publicationDate":"2014-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2014/345106","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-11-07eCollection Date: 2013-01-01DOI: 10.1155/2013/174064
Boseon Byeon
{"title":"NucVoter: A Voting Algorithm for Reliable Nucleosome Prediction Using Next-Generation Sequencing Data.","authors":"Boseon Byeon","doi":"10.1155/2013/174064","DOIUrl":"https://doi.org/10.1155/2013/174064","url":null,"abstract":"<p><p>Nucleosomes, which consist of DNA wrapped around histone octamers, are dynamic, and their structure, including their location, size, and occupancy, can be transformed. Nucleosomes can regulate gene expression by controlling the DNA accessibility of proteins. Using next-generation sequencing techniques along with such laboratory methods as micrococcal nuclease digestion, predicting the genomic locations of nucleosomes is possible. However, the true locations of nucleosomes are unknown, and it is difficult to determine their exact locations using next-generation sequencing data. This paper proposes a novel voting algorithm, NucVoter, for the reliable prediction of nucleosome locations. Multiple models verify the consensus areas in which nucleosomes are placed by the model with the highest priority. NucVoter significantly improves the performance of nucleosome prediction. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"174064"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/174064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-10-21eCollection Date: 2013-01-01DOI: 10.1155/2013/640518
Gizem Ozbuyukkaya, Elif Ozkirimli Olmez, Kutlu O Ulgen
{"title":"Discovery of YopE Inhibitors by Pharmacophore-Based Virtual Screening and Docking.","authors":"Gizem Ozbuyukkaya, Elif Ozkirimli Olmez, Kutlu O Ulgen","doi":"10.1155/2013/640518","DOIUrl":"https://doi.org/10.1155/2013/640518","url":null,"abstract":"<p><p>Gram-negative bacteria Yersinia secrete virulence factors that invade eukaryotic cells via type III secretion system. One particular virulence member, Yersinia outer protein E (YopE), targets Rho family of small GTPases by mimicking regulator GAP protein activity, and its secretion mainly induces cytoskeletal disruption and depolymerization of actin stress fibers within the host cell. In this work, potent drug-like inhibitors of YopE are investigated with virtual screening approaches. More than 500,000 unique small molecules from ZINC database were screened with a five-point pharmacophore, comprising three hydrogen acceptors, one hydrogen donor, and one ring, and derived from different salicylidene acylhydrazides. Binding modes and features of these molecules were investigated with a multistep molecular docking approach using Glide software. Virtual screening hits were further analyzed based on their docking score, chemical similarity, pharmacokinetic properties, and the key Arg144 interaction along with other active site residue interactions with the receptor. As a final outcome, a diverse set of ligands with inhibitory potential were proposed. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"640518"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/640518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-09-11eCollection Date: 2013-01-01DOI: 10.1155/2013/481545
Shanrong Zhao, Kurt Prenger, Lance Smith
{"title":"Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.","authors":"Shanrong Zhao, Kurt Prenger, Lance Smith","doi":"10.1155/2013/481545","DOIUrl":"https://doi.org/10.1155/2013/481545","url":null,"abstract":"<p><p>RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"481545"},"PeriodicalIF":0.0,"publicationDate":"2013-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/481545","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-09-03eCollection Date: 2013-01-01DOI: 10.1155/2013/252183
Xiandong Meng, Yanqing Ji
{"title":"Modern Computational Techniques for the HMMER Sequence Analysis.","authors":"Xiandong Meng, Yanqing Ji","doi":"10.1155/2013/252183","DOIUrl":"https://doi.org/10.1155/2013/252183","url":null,"abstract":"<p><p>This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications-hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"252183"},"PeriodicalIF":0.0,"publicationDate":"2013-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-08-12eCollection Date: 2013-01-01DOI: 10.1155/2013/962760
Soudabeh Sabetian Fard Jahromi, Mohd Shahir Shamsir
{"title":"Construction and Analysis of the Cell Surface's Protein Network for Human Sperm-Egg Interaction.","authors":"Soudabeh Sabetian Fard Jahromi, Mohd Shahir Shamsir","doi":"10.1155/2013/962760","DOIUrl":"https://doi.org/10.1155/2013/962760","url":null,"abstract":"<p><p>Sperm-egg interaction is one of the most impressive processes in sexual reproduction, and understanding the molecular mechanism is crucial in solving problems in infertility and failed in vitro fertilization. The main purpose of this study is to map the sperm-egg interaction network between cell-surface proteins and perform an interaction analysis on this new network. We built the first protein interaction network of human sperm-egg binding and fusion proteins that consists of 84 protein nodes and 112 interactions. The gene ontology analysis identified a number of functional clusters that may be involved in the sperm-egg interaction. These include G-protein coupled receptor protein signaling pathway, cellular membrane fusion, and single fertilization. The PPI network showed a highly interconnected network and identified a set of candidate proteins: ADAM-ZP3, ZP3-CLGN, IZUMO1-CD9, and ADAM2-IZUMO1 that may have an important role in sperm-egg interaction. The result showed that the ADAM2 may mediate interaction between two essential factors CD9 and IZUMO1. The KEGG analysis showed 12 statistically significant pathways with 10 proteins associated with cancer, suggesting a common pathway between tumor fusion and sperm-egg fusion. We believe that the availability of this map will assist future researches in the fertilization mechanism and will also facilitate biological interpretation of sperm-egg interaction. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"962760"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/962760","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-08-01eCollection Date: 2013-01-01DOI: 10.1155/2013/437168
Saumya K Patel, Linz-Buoy George, Sivakumar Prasanth Kumar, Hyacinth N Highland, Yogesh T Jasrai, Himanshu A Pandya, Ketaki R Desai
{"title":"A Computational Approach towards the Understanding of Plasmodium falciparum Multidrug Resistance Protein 1.","authors":"Saumya K Patel, Linz-Buoy George, Sivakumar Prasanth Kumar, Hyacinth N Highland, Yogesh T Jasrai, Himanshu A Pandya, Ketaki R Desai","doi":"10.1155/2013/437168","DOIUrl":"10.1155/2013/437168","url":null,"abstract":"<p><p>The emergence of drug resistance in Plasmodium falciparum tremendously affected the chemotherapy worldwide while the intense distribution of chloroquine-resistant strains in most of the endemic areas added more complications in the treatment of malaria. The situation has even worsened by the lack of molecular mechanism to understand the resistance conferred by Plasmodia species. Recent studies have suggested the association of antimalarial resistance with P. falciparum multidrug resistance protein 1 (PfMDR1), an ATP-binding cassette (ABC) transporter and a homologue of human P-glycoprotein 1 (P-gp1). The present study deals about the development of PfMDR1 computational model and the model of substrate transport across PfMDR1 with insights derived from conformations relative to inward- and outward-facing topologies that switch on/off the transportation system. Comparison of ATP docked positions and its structural motif binding properties were found to be similar among other ATPases, and thereby contributes to NBD domains dimerization, a unique structural agreement noticed in Mus musculus Pgp and Escherichia coli MDR transporter homolog (MsbA). The interaction of leading antimalarials and phytochemicals within the active pocket of both wild-type and mutant-type PfMDR1 demonstrated the mode of binding and provided insights of less binding affinity thereby contributing to parasite's resistance mechanism. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"437168"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393060/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-06-17eCollection Date: 2013-01-01DOI: 10.1155/2013/671269
Amna Ijaz
{"title":"SUMOhunt: Combining Spatial Staging between Lysine and SUMO with Random Forests to Predict SUMOylation.","authors":"Amna Ijaz","doi":"10.1155/2013/671269","DOIUrl":"10.1155/2013/671269","url":null,"abstract":"<p><p>Modification with SUMO protein has many key roles in eukaryotic systems which renders the identification of its target proteins and sites of considerable importance. Information regarding the SUMOylation of a protein may tell us about its subcellular localization, function, and spatial orientation. This modification occurs at particular and not all lysine residues in a given protein. In competition with biochemical means of modified-site recognition, computational methods are strong contenders in the prediction of SUMOylation-undergoing sites on proteins. In this research, physicochemical properties of amino acids retrieved from AAIndex, especially those involved in docking of modifier and target proteins and optimal presentation of target lysine, in combination with sequence information and random forest-based classifier presented in WEKA have been used to develop a prediction model, SUMOhunt, with statistics significantly better than all previous predictors. In this model 97.56% accuracy, 100% sensitivity, 94% specificity, and 0.95 MCC have been achieved which shows that proposed amino acid properties have a significant role in SUMO attachment. SUMOhunt will hence bring great reliability and efficiency in SUMOylation prediction. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"671269"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISRN bioinformaticsPub Date : 2013-06-03eCollection Date: 2013-01-01DOI: 10.1155/2013/404717
J R Deller, Hayder Radha, J Justin McCormick
{"title":"Exploiting identifiability and intergene correlation for improved detection of differential expression.","authors":"J R Deller, Hayder Radha, J Justin McCormick","doi":"10.1155/2013/404717","DOIUrl":"https://doi.org/10.1155/2013/404717","url":null,"abstract":"<p><p>Accurate differential analysis of microarray data strongly depends on effective treatment of intergene correlation. Such dependence is ordinarily accounted for in terms of its effect on significance cutoffs. In this paper, it is shown that correlation can, in fact, be exploited to share information across tests and reorder expression differentials for increased statistical power, regardless of the threshold. Significantly improved differential analysis is the result of two simple measures: (i) adjusting test statistics to exploit information from identifiable genes (the large subset of genes represented on a microarray that can be classified a priori as nondifferential with very high confidence], but (ii) doing so in a way that accounts for linear dependencies among identifiable and nonidentifiable genes. A method is developed that builds upon the widely used two-sample t-statistic approach and uses analysis in Hilbert space to decompose the nonidentified gene vector into two components that are correlated and uncorrelated with the identified set. In the application to data derived from a widely studied prostate cancer database, the proposed method outperforms some of the most highly regarded approaches published to date. Algorithms in MATLAB and in R are available for public download. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"404717"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/404717","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}