{"title":"Gene expression data analysis using multiobjective clustering improved with SVM based ensemble.","authors":"Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay","doi":"10.3233/ISB-2012-0441","DOIUrl":"https://doi.org/10.3233/ISB-2012-0441","url":null,"abstract":"<p><p>Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0441","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov
{"title":"Coordinated evolution of the hepatitis B virus polymerase.","authors":"D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov","doi":"10.3233/ISB-2012-0452","DOIUrl":"https://doi.org/10.3233/ISB-2012-0452","url":null,"abstract":"The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky
{"title":"Improved transcriptome quantification and reconstruction from RNA-Seq reads using partial annotations.","authors":"Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky","doi":"10.3233/ISB-2012-0459","DOIUrl":"https://doi.org/10.3233/ISB-2012-0459","url":null,"abstract":"<p><p>The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev
{"title":"ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes.","authors":"Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev","doi":"10.3233/ISB-2012-0448","DOIUrl":"https://doi.org/10.3233/ISB-2012-0448","url":null,"abstract":"<p><p>The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0448","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30870648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan
{"title":"Mixture model analysis reflecting dynamics of the population diversity of 2009 pandemic H1N1 influenza virus.","authors":"Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan","doi":"10.3233/ISB-2012-0457","DOIUrl":"10.3233/ISB-2012-0457","url":null,"abstract":"<p><p>Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710479/pdf/nihms749403.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"miRNA-mRNA network detects hub mRNAs and cancer specific miRNAs in lung cancer.","authors":"Saranya Devaraj, Jeyakumar Natarajan","doi":"10.3233/ISB-2012-0444","DOIUrl":"https://doi.org/10.3233/ISB-2012-0444","url":null,"abstract":"<p><p>MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.","authors":"Austin Huang, Rami Kantor, Allison DeLong, Leeann Schreier, Sorin Istrail","doi":"10.3233/ISB-2012-0454","DOIUrl":"10.3233/ISB-2012-0454","url":null,"abstract":"<p><p>Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530257/pdf/nihms879660.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox
{"title":"Improving usability and accessibility of cheminformatics tools for chemists through cyberinfrastructure and education.","authors":"Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox","doi":"10.3233/CI-2008-0015","DOIUrl":"https://doi.org/10.3233/CI-2008-0015","url":null,"abstract":"<p><p>Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting \"best of breed\" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/CI-2008-0015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili
{"title":"Differences in variability of hypervariable region 1 of hepatitis C virus (HCV) between acute and chronic stages of HCV infection.","authors":"I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili","doi":"10.3233/ISB-2012-0451","DOIUrl":"https://doi.org/10.3233/ISB-2012-0451","url":null,"abstract":"<p><p>Distinguishing between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of 2179 clonal sequences derived from hypervariable region 1 (HVR1) of the HCV genome in samples obtained from patients with acute (n = 49) and chronic (n = 102) HCV infection showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Significant differences in frequencies of 5 amino acids (positions 5, 7, 12, 16 and 18) and the average genetic distances among intra-host HVR1 variants were found using analysis of molecular variance. Differences were also observed in the polarity, volume and hydrophobicity of 10 amino acids (at positions 1, 4, 5, 12, 14, 15, 16, 21, 22 and 29). Based on these properties, a classification model could be constructed, which permitted HVR1 variants from acute and chronic cases to be discriminated with an accuracy of 88%. Progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1. Identifying these changes may permit diagnosis of recent HCV infection.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0451","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of common tumor signatures based on gene set enrichment analysis.","authors":"Xiaosheng Wang","doi":"10.3233/ISB-2012-0440","DOIUrl":"10.3233/ISB-2012-0440","url":null,"abstract":"<p><p>The identification of common tumor signatures can discover the shared molecular mechanisms underlying tumorgenesis whereby we can prevent and treat tumors by a system intervention. We identified tumor-associated signatures including pathways, transcription factors, microRNAs and gene ontology categories by analyzing gene sets for differential expression between normal vs. tumor phenotypes classes in various tumor gene expression datasets. We obtained the common tumor signatures based on their identified frequencies for different tumor types. Some shared signatures important for various tumor types were uncovered and discussed. We proposed that the interventions aiming at both the shared tumor signatures and the tissue-specific tumor signatures might be a potential approach to overcoming cancer.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3579559/pdf/nihms443974.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}