{"title":"Calculation of protein-ligand binding free energy using smooth reaction path generation (SRPG) method: a comparison of the explicit water model, gb/sa model and docking score function.","authors":"D. Mitomo, Y. Fukunishi, J. Higo, Haruki Nakamura","doi":"10.1142/9781848165632_0008","DOIUrl":"https://doi.org/10.1142/9781848165632_0008","url":null,"abstract":"We compared the protein-ligand binding free energies (G) obtained by the explicit water model, the MM-GB/SA (molecular-mechanics generalized Born surface area) model, and the docking scoring function. The free energies by the explicit water model and the MM-GB/SA model were calculated by the previously developed Smooth Reaction Path Generation (SRPG) method. In the SRPG method, a smooth reaction path was generated by linking two coordinates, one a bound state and the other an unbound state. The free energy surface along the path was calculated by a molecular dynamics (MD) simulation, and the binding free energy was estimated from the free energy surface. We applied these methods to the streptavidin-and-biotin system. The G value by the explicit water model was close to the experimental value. The G value by the MM-GB/SA model was overestimated and that by the scoring function was underestimated. The free energy surface by the explicit water model was close to that by the GB/SA model around the bound state (distances of < 6 A), but the discrepancy appears at distances of > 6 A. Thus, the difference in long-range Coulomb interaction should cause the error in G. The scoring function cannot take into account the entropy change of the protein. Thus, the error of G could depend on the target protein.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84567597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Hayes, Diego Diez, Nicolas Joannin, M. Kanehisa, M. Wahlgren, C. Wheelock, S. Goto
{"title":"Tools for investigating mechanisms of antigenic variation: new extensions to varDB.","authors":"C. Hayes, Diego Diez, Nicolas Joannin, M. Kanehisa, M. Wahlgren, C. Wheelock, S. Goto","doi":"10.1142/9781848165632_0005","DOIUrl":"https://doi.org/10.1142/9781848165632_0005","url":null,"abstract":"The varDB project (http://www.vardb.org) aims to create and maintain a curated database of antigenic variation sequences as well as a platform for online sequence analysis. Along with the evolution of drug resistance, antigenic variation presents a moving target for public health endeavors and greatly complicates vaccination and eradication efforts. However, careful analysis of a large number of variant forms may reveal structural and functional constraints that can be exploited to identify stable and cross-reactive targets. VarDB attempts to facilitate this effort by providing streamlined interfaces to standard tools to help identify and prepare sequences for various forms of analysis. We have newly implemented such tools for codon usage, selection, recombination, secondary and tertiary structure, and sequence diversity analysis. Just as the adaptive immune system encodes a mechanism for dynamically generating diverse receptors instead of encoding a receptor for every possible epitope, many pathogens take advantage of heritable diversity generating mechanisms to produce progeny able to evade immune recognition. Instead of merely cataloging the observed variation, a major goal of varDB is to characterize and predict the potential range of antigenic variation within a pathogen by investigating the mechanisms by which it attempts to expand its implicit genome. We believe that the new sequence analysis tools will improve the usefulness and range of varDB.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86762633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting protein-protein relationships from literature using latent topics.","authors":"T. Aso, K. Eguchi","doi":"10.1142/9781848165632_0001","DOIUrl":"https://doi.org/10.1142/9781848165632_0001","url":null,"abstract":"This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79097085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new generation of homology search tools based on probabilistic inference.","authors":"Sean R Eddy","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.</p>","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28733802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bjoern Goemann, Anatolij P Potapov, Michael Ante, Edgar Wingender
{"title":"Comparative analysis of topological patterns in different mammalian networks.","authors":"Bjoern Goemann, Anatolij P Potapov, Michael Ante, Edgar Wingender","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We have systematically analyzed various topological patterns comprising 1, 2 or 3 nodes in the mammalian metabolic, signal transduction and transcription networks: These patterns were analyzed with regard to their frequency and statistical over-representation in each network, as well as to their topological significance for the coherence of the networks. The latter property was evaluated using the pairwise disconnectivity index, which we have recently introduced to quantify how critical network components are for the internal connectedness of a network. The 1-node pattern made up by a vertex with a self-loop has been found to exert particular properties in all three networks. In general, vertices with a self-loop tend to be topologically more important than other vertices. Moreover, self-loops have been found to be attached to most 2-node and 3-node patterns, thereby emphasizing a particular role of self-loop components in the architectural organization of the networks. For none of the networks, a positive correlation between the mean topological significance and the Z-score of a pattern could be observed. That is, in general, motifs are not per se more important for the overall network coherence than patterns that are not over-represented. All 2- and 3-node patterns that are over-represented and thus qualified as motifs in all three networks exhibit a loop structure. This intriguing observation can be viewed as an advantage of loop-like structures in building up the regulatory circuits of the whole cell. The transcription network has been found to differ from the other networks in that (i) self-loops play an even higher role, (ii) its binary loops are highly enriched with self-loops attached, and (iii) feed-back loops are not over-represented. Metabolic networks reveal some particular topological properties which may reflect the fact that metabolic paths are, to a large extent, reversible. Interestingly, some of the most important 3-node patterns of both the transcription and the signaling network can be concatenated to subnetworks comprising many genes that play a particular role in the regulation of cell proliferation.</p>","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28734940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive analysis of sequence-structure relationships in the loop regions of proteins.","authors":"Shugo Nakamura, Kentaro Shimizu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Local sequence-structure relationships in the loop regions of proteins were comprehensively estimated using simple prediction tools based on support vector regression (SVR). End-to-end distance was selected as a rough structural property of fragments, and the end-to-end distances of an enormous number of loop fragments from a wide variety of protein folds were directly predicted from sequence information by using SVR. We found that our method was more accurate than random prediction for predicting the structure of fragments comprising 5, 9, and 17 amino acids; moreover, the extended loop fragments could be successfully distinguished from turn structures on the basis of their sequences, which implies that the sequence-structure relationships were significant for loop fragments with a wide range of end-to-end distances. These results suggest that many loop regions as well as helices and strands restrict the conformational space of the entire tertiary structure of proteins to some extent; moreover, our findings throw light on the mechanism of protein folding and prediction of the tertiary structure of proteins without using structural templates.</p>","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28735856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative analysis of aerobic and anaerobic prokaryotes to identify correlation between oxygen requirement and gene-gene functional association patterns.","authors":"Yaming Lin, Hongwei Wu","doi":"10.1142/9781848165632_0007","DOIUrl":"https://doi.org/10.1142/9781848165632_0007","url":null,"abstract":"Activities of prokaryotes are pivotal in shaping the environment, and are also greatly influenced by the environment. With the substantial progress in genome and metagenome sequencing and the about-to-be-standardized ecological context information, environment-centric comparative genomics will complement species-centric comparative genomics, illuminating how environments have shaped and maintained prokaryotic diversities. In this paper we report our preliminary studies on the association analysis of a particular duo of genomic and ecological traits of prokaryotes--gene-gene functional association patterns vs. oxygen requirement conditions. We first establish a stochastic model to describe gene arrangements on chromosomes, based on which the functional association between genes are quantified. The gene-gene functional association measures are validated using biological process ontology and KEGG pathway annotations. Student's t-tests are then performed on the aerobic and anaerobic organisms to identify those gene pairs that exhibit different functional association patterns in the two different oxygen requirement conditions. As it is difficult to design and conduct biological experiments to validate those genome-environment association relationships that have resulted from long-term accumulative genome-environment interactions, we finally conduct computational validations to determine whether the oxygen requirement condition of an organism is predictable based on gene-gene functional association patterns. The reported study demonstrates the existence and significance of the association relationships between certain gene-gene functional association patterns and oxygen requirement conditions of prokaryotes, as well as the effectiveness of the adopted methodology for such association analysis.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73376822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategies toward CNS-regeneration using induced pluripotent stem cells.","authors":"H. Okano","doi":"10.1142/9781848165632_0022","DOIUrl":"https://doi.org/10.1142/9781848165632_0022","url":null,"abstract":"Induced pluripotent stem (iPS) cells are pluripotent stem cells directly reprogrammed from cultured mouse fibroblast by introducing Oct3/4, Sox2, c-Myc, and Klf4. Cells obtained using this technology, which allows the ethical issues and immunological rejection associated with embryonic stem (ES) cells to be avoided, might be a clinically useful source for cell replacement therapics. Here we demonstrate that murine iPS cells formed neurospheres that produced electrophysiologically functional neurons, astrocytes, and oligodendrocytes. Secondary neurospheres (SNSs) generated from various mouse iPS cell showed their neural differentiation capacity and teratoma formation after transplantation into the brain of immunodeficient NOD/SCID mice. We found that origin (source of somatic cells) of the iPS cells are the crucial determinant for the potential tumorigenicity of iPS-derived neural stem/progenitor cclls and that their tumorigenicity results from the persistent presence of undifferentiated cells within the SNSs. Furthermore, transplantation of non-tumorigenic Nanog-iPS-derived SNSs into mouse spinal cord injury (SCI) model promoted locomotor function recovery. Surprisingly, SNSs derived from c-Myc minus iPS cells generated without drug selection showed robust tumorigenesis, in spite of their potential to contribute adult chimeric mice without tumor formation.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88972986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward Wijaya, Martin C Frith, Yutaka Suzuki, Paul Horton
{"title":"Recount: expectation maximization based error correction tool for next generation sequencing data.","authors":"Edward Wijaya, Martin C Frith, Yutaka Suzuki, Paul Horton","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.</p>","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28733801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting protein-protein relationships from literature using latent topics.","authors":"Tatsuya Aso, Koji Eguchi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.</p>","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28734937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}