Seshan Ananthasubramanian, Rahul Metri, Ankur Khetan, Aman Gupta, Adam Handen, Nagasuma Chandra, Madhavi Ganapathiraju
{"title":"Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.","authors":"Seshan Ananthasubramanian, Rahul Metri, Ankur Khetan, Aman Gupta, Adam Handen, Nagasuma Chandra, Madhavi Ganapathiraju","doi":"10.1186/2042-5783-2-4","DOIUrl":"https://doi.org/10.1186/2042-5783-2-4","url":null,"abstract":"<p><strong>Background: </strong>Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting \"most-promising\" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.</p><p><strong>Results: </strong>We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.</p><p><strong>Conclusions: </strong>Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction sys","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30619562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metagenomics - a guide from sampling to data analysis.","authors":"Torsten Thomas, Jack Gilbert, Folker Meyer","doi":"10.1186/2042-5783-2-3","DOIUrl":"10.1186/2042-5783-2-3","url":null,"abstract":"<p><p> Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2012-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351745/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30620216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manal Helal, Fanrong Kong, Sharon Ca Chen, Fei Zhou, Dominic E Dwyer, John Potter, Vitali Sintchenko
{"title":"Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments.","authors":"Manal Helal, Fanrong Kong, Sharon Ca Chen, Fei Zhou, Dominic E Dwyer, John Potter, Vitali Sintchenko","doi":"10.1186/2042-5783-2-2","DOIUrl":"https://doi.org/10.1186/2042-5783-2-2","url":null,"abstract":"<p><strong>Background: </strong>Comparative genomics has put additional demands on the assessment of similarity between sequences and their clustering as means for classification. However, defining the optimal number of clusters, cluster density and boundaries for sets of potentially related sequences of genes with variable degrees of polymorphism remains a significant challenge. The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets.</p><p><strong>Results: </strong>A novel method that combines the linear mapping hash function and multiple sequence alignment (MSA) was developed. This method takes advantage of the already sorted by similarity sequences from the MSA output, and identifies the optimal number of clusters, clusters cut-offs, and clusters centroids that can represent reference gene vouchers for the different species. The linear mapping hash function can map an already ordered by similarity distance matrix to indices to reveal gaps in the values around which the optimal cut-offs of the different clusters can be identified. The method was evaluated using sets of closely related (16S rRNA gene sequences of Nocardia species) and highly variable (VP1 genomic region of Enterovirus 71) sequences and outperformed existing unsupervised machine learning clustering methods and dimensionality reduction methods. This method does not require prior knowledge of the number of clusters or the distance between clusters, handles clusters of different sizes and shapes, and scales linearly with the dataset.</p><p><strong>Conclusions: </strong>The combination of MSA with the linear mapping hash function is a computationally efficient way of gene sequence clustering and can be a valuable tool for the assessment of similarity, clustering of different microbial genomes, identifying reference sequences, and for the study of evolution of bacteria and viruses.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2012-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30618713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of the University of Minnesota Biocatalysis/Biodegradation Database for study of microbial degradation.","authors":"Lynda Bm Ellis, Lawrence P Wackett","doi":"10.1186/2042-5783-2-1","DOIUrl":"https://doi.org/10.1186/2042-5783-2-1","url":null,"abstract":"<p><p> Microorganisms are ubiquitous on earth and have diverse metabolic transformative capabilities important for environmental biodegradation of chemicals that helps maintain ecosystem and human health. Microbial biodegradative metabolism is the main focus of the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD). UM-BBD data has also been used to develop a computational metabolic pathway prediction system that can be applied to chemicals for which biodegradation data is currently lacking. The UM-Pathway Prediction System (UM-PPS) relies on metabolic rules that are based on organic functional groups and predicts plausible biodegradative metabolism. The predictions are useful to environmental chemists that look for metabolic intermediates, for regulators looking for potential toxic products, for microbiologists seeking to understand microbial biodegradation, and others with a wide-range of interests.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2012-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30619415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sónia Carneiro, Anália Lourenço, Eugénio C Ferreira, Isabel Rocha
{"title":"Stringent response of Escherichia coli: revisiting the bibliome using literature mining.","authors":"Sónia Carneiro, Anália Lourenço, Eugénio C Ferreira, Isabel Rocha","doi":"10.1186/2042-5783-1-14","DOIUrl":"https://doi.org/10.1186/2042-5783-1-14","url":null,"abstract":"<p><strong>Background: </strong>Understanding the mechanisms responsible for cellular responses depends on the systematic collection and analysis of information on the main biological concepts involved. Indeed, the identification of biologically relevant concepts in free text, namely genes, tRNAs, mRNAs, gene products and small molecules, is crucial to capture the structure and functioning of different responses.</p><p><strong>Results: </strong>In this work, we review literature reports on the study of the stringent response in Escherichia coli. Rather than undertaking the development of a highly specialised literature mining approach, we investigate the suitability of concept recognition and statistical analysis of concept occurrence as means to highlight the concepts that are most likely to be biologically engaged during this response. The co-occurrence analysis of core concepts in this stringent response, i.e. the (p)ppGpp nucleotides with gene products was also inspected and suggest that besides the enzymes RelA and SpoT that control the basal levels of (p)ppGpp nucleotides, many other proteins have a key role in this response. Functional enrichment analysis revealed that basic cellular processes such as metabolism, transcriptional and translational regulation are central, but other stress-associated responses might be elicited during the stringent response. In addition, the identification of less annotated concepts revealed that some (p)ppGpp-induced functional activities are still overlooked in most reviews.</p><p><strong>Conclusions: </strong>In this paper we applied a literature mining approach that offers a more comprehensive analysis of the stringent response in E. coli. The compilation of relevant biological entities to this stress response and the assessment of their functional roles provided a more systematic understanding of this cellular response. Overlooked regulatory entities, such as transcriptional regulators, were found to play a role in this stress response. Moreover, the involvement of other stress-associated concepts demonstrates the complexity of this cellular response.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"1 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2011-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-1-14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30618686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PerPlot & PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences.","authors":"Jan Mrázek, Tejas Chaudhari, Aryabrata Basu","doi":"10.1186/2042-5783-1-13","DOIUrl":"https://doi.org/10.1186/2042-5783-1-13","url":null,"abstract":"<p><strong>Background: </strong>Periodic spacing of short adenine or thymine runs phased with DNA helical period of ~10.5 bp is associated with intrinsic DNA curvature and deformability, which play important roles in DNA-protein interactions and in the organization of chromosomes in both eukaryotes and prokaryotes. Local differences in DNA sequence periodicity have been linked to differences in gene expression in some organisms. Despite the significance of these periodic patterns, there are virtually no publicly accessible tools for their analysis.</p><p><strong>Results: </strong>We present novel tools suitable for assessments of DNA curvature-related sequence periodicity in nucleotide sequences at the genome scale. Utility of the present software is demonstrated on a comparison of sequence periodicities in the genomes of Haemophilus influenzae, Methanocaldococcus jannaschii, Saccharomyces cerevisiae, and Arabidopsis thaliana. The software can be accessed through a web interface and the programs are also available for download.</p><p><strong>Conclusions: </strong>The present software is suitable for comparing DNA curvature-related sequence periodicity among different genomes as well as for analysis of intrachromosomal heterogeneity of the sequence periodicity. It provides a quick and convenient way to detect anomalous regions of chromosomes that could have unusual structural and functional properties and/or distinct evolutionary history.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"1 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-1-13","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30619064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of a novel RNA binding domain in crocodilepox Zimbabwe Gene 157.","authors":"Nicole S Little, Taylor Quon, Chris Upton","doi":"10.1186/2042-5783-1-12","DOIUrl":"https://doi.org/10.1186/2042-5783-1-12","url":null,"abstract":"<p><strong>Background: </strong>Although the crocodilepox virus (CRV) is currently unclassified, phylogenetic analyses suggest that its closest known relatives are molluscum contagiosum virus (MCV) and the avipox viruses. The CRV genome is approximately 190 kb and contains a large number of unique genes in addition to the set of conserved Chordopoxvirus genes found in all such viruses. Upon sequencing the viral genome, others noted that this virus was also unusual because of the lack of a series of common immuno-suppressive genes. However, the genome contains multiple genes of unknown function that are likely to function in reducing the anti-viral response of the host.</p><p><strong>Results: </strong>By using sensitive database searches for similarity, we observed that gene 157 of CRV-strain Zimbabwe (CRV-ZWE) encodes a protein with a domain that is predicted to bind dsRNA. Domain characterization supported this prediction, therefore, we tested the ability of the Robetta protein structure prediction server to model the amino acid sequence of this protein on a well-characterized RNA binding domain. The model generated by Robetta suggests that CRV-ZWE-157 does indeed contain an RNA binding domain; the model could be overlaid on the template protein structure with high confidence.</p><p><strong>Conclusion: </strong>We hypothesize that CRV-ZWE-157 encodes a novel poxvirus RNA binding protein and suggest that as a non-core gene it may play a role in host-range determination or function to dampen host anti-viral responses. Potential targets for this CRV protein include the host interferon response and miRNA pathways.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"1 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2011-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-1-12","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30619325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Translational web robots for pathogen genome analysis.","authors":"Vitali Sintchenko, Enrico W Coiera","doi":"10.1186/2042-5783-1-10","DOIUrl":"https://doi.org/10.1186/2042-5783-1-10","url":null,"abstract":"","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"1 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2011-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-1-10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30620556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivaylo Kostadinov, Renzo Kottmann, Alban Ramette, Jost Waldmann, Pier Luigi Buttigieg, Frank Oliver Glöckner
{"title":"Quantifying the effect of environment stability on the transcription factor repertoire of marine microbes.","authors":"Ivaylo Kostadinov, Renzo Kottmann, Alban Ramette, Jost Waldmann, Pier Luigi Buttigieg, Frank Oliver Glöckner","doi":"10.1186/2042-5783-1-9","DOIUrl":"https://doi.org/10.1186/2042-5783-1-9","url":null,"abstract":"<p><strong>Background: </strong>DNA-binding transcription factors (TFs) regulate cellular functions in prokaryotes, often in response to environmental stimuli. Thus, the environment exerts constant selective pressure on the TF gene content of microbial communities. Recently a study on marine Synechococcus strains detected differences in their genomic TF content related to environmental adaptation, but so far the effect of environmental parameters on the content of TFs in bacterial communities has not been systematically investigated.</p><p><strong>Results: </strong>We quantified the effect of environment stability on the transcription factor repertoire of marine pelagic microbes from the Global Ocean Sampling (GOS) metagenome using interpolated physico-chemical parameters and multivariate statistics. Thirty-five percent of the difference in relative TF abundances between samples could be explained by environment stability. Six percent was attributable to spatial distance but none to a combination of both spatial distance and stability. Some individual TFs showed a stronger relationship to environment stability and space than the total TF pool.</p><p><strong>Conclusions: </strong>Environmental stability appears to have a clearly detectable effect on TF gene content in bacterioplanktonic communities described by the GOS metagenome. Interpolated environmental parameters were shown to compare well to in situ measurements and were essential for quantifying the effect of the environment on the TF content. It is demonstrated that comprehensive and well-structured contextual data will strongly enhance our ability to interpret the functional potential of microbes from metagenomic data.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"1 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2011-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-1-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30620575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}