{"title":"RNA: An Expanding View of Function and Evolution","authors":"Xinwei Han, Yuan Chen, Liuyang Wang, Wenwen Fang, Ning Zhang, Qiyun Zhu","doi":"10.4137/EBO.S38105","DOIUrl":"https://doi.org/10.4137/EBO.S38105","url":null,"abstract":"Supplement Aims and Scope have suggested that structural conservation in these lincRNAs may have been retained, despite the apparent lack of sequence conservation. These recently identified non-coding RNAs represent an evolutionary history different from that of the protein coding genes, which remains to be explored. The function and evolution of alternative splicing § § Alternative splicing tremendously diversifies transcriptomes among organisms, even when their repertoires of protein coding genes are similar. The fast improvement on read length of NGS technologies will render a more thorough and unambiguous identification of alternative splice forms. Such resources could serve as the basis for exploring conservation and divergence of splicing events among organisms. They could also facilitate studies on the function of splicing events specific to certain organisms. RNA-seq of non-model organisms for § § phylogenomic studies RNA-seq approaches enabled de novo identification of genes from organisms of no assembled genome sequences. Analyzing such extensive list of genes will result in better resolution of organism phylogeny. In addition, by comparing gene sequences among a wide range of organisms, many intriguing evolutionary questions may be addressed. For instance, how frequently genome duplication has happened in a certain taxon? How frequently horizontal gene transfer has happened between symbiotic organisms or parasites and hosts? Next generation sequencing (NGS) technologies have enabled unprecedentedly deep characterization of transcriptomes. Compared to the microarray technology, NGS has been a much more favorable method for transcriptome profiling, as it doesn't require any pre-existing knowledge of the transcrip-tome of any given species. By sequencing transcriptomes to enough depth, several studies have reported a remarkably large number of novel RNA species and/or previously undetected splice forms. An inclusive identification of genes for organisms without genome information is also within reach. With intricate designs, it is even possible to specifically sequence the double-stranded portion of RNAs, such that the secondary structure of these RNAs could be inferred. Such comprehensive elucidation of transcriptomes has opened many new venues for studies on the function and evolution of the diverse RNA repertoire. The current supplement aims to serve as a portal to report original studies and summarize progress in this fast-moving area. Novel RNA species discovered by NGS and their § § unique evolutionary history Genome-wide studies utilizing NGS technologies have uncovered novel types of RNAs, such as lncRNA, circular RNA and spliRNA. Many of these novel RNA species have been reported to play important regulatory roles. With …","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125445269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage","authors":"William A. Freyman","doi":"10.4137/EBO.S35384","DOIUrl":"https://doi.org/10.4137/EBO.S35384","url":null,"abstract":"The amount of phylogenetically informative sequence data in GenBank is growing at an exponential rate, and large phylogenetic trees are increasingly used in research. Tools are needed to construct phylogenetic sequence matrices from GenBank data and evaluate the effect of missing data. Supermatrix Constructor (SUMAC) is a tool to data-mine GenBank, construct phylogenetic supermatrices, and assess the phylogenetic decisiveness of a matrix given the pattern of missing sequence data. SUMAC calculates a novel metric, Missing Sequence Decisiveness Scores (MSDS), which measures how much each individual missing sequence contributes to the decisiveness of the matrix. MSDS can be used to compare supermatrices and prioritize the acquisition of new sequence data. SUMAC constructs supermatrices either through an exploratory clustering of all GenBank sequences within a taxonomic group or by using guide sequences to build homologous clusters in a more targeted manner. SUMAC assembles supermatrices for any taxonomic group recognized in GenBank and is optimized to run on multicore computer systems by parallelizing multiple stages of operation. SUMAC is implemented as a Python package that can run as a stand-alone command-line program, or its modules and objects can be incorporated within other programs. SUMAC is released under the open source GPLv3 license and is available at https://github.com/wf8/sumac.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130637524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gene Tree Affects Inference of Sites Under Selection by the Branch-Site Test of Positive Selection","authors":"Y. Diekmann, J. Pereira-Leal","doi":"10.4137/EBO.S30902","DOIUrl":"https://doi.org/10.4137/EBO.S30902","url":null,"abstract":"The branch-site test of positive selection is a standard approach to detect past episodic positive selection in a priori-specified branches of a gene phylogeny. Here, we ask if differences in the topology of the gene tree have any influence on the ability to infer positively selected sites. Using simulated sequences, we compare the results obtained for true and rearranged topologies. We find a strong relationship between “conflicting branch length,” which occurs when the set of sequences that experiences selection for a given topology and foreground is changed, and the ability to predict positively selected sites. Moreover, by reanalyzing a previously published data set, we show that the choice of a gene tree also affects the results obtained for real-world sequences. This is the first study to demonstrate that tree topology has a clear effect on the inference of positive selection. We conclude that the choice of a gene tree is an important factor for the branch-site analysis of positive selection.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Phylotranscriptomic Analysis Based on Coalescence was Less Influenced by the Evolving Rates and the Number of Genes: A Case Study in Ericales","authors":"Luchong Zhang, Wei Wu, Haifei Yan, X. Ge","doi":"10.4137/EBO.S22448","DOIUrl":"https://doi.org/10.4137/EBO.S22448","url":null,"abstract":"Advances in high-throughput sequencing have generated a vast amount of transcriptomic data that are being increasingly used in phylogenetic reconstruction. However, processing the vast datasets for a huge number of genes and even identifying optimal analytical methodology are challenging. Through de novo sequenced and retrieved data from public databases, we identified 221 orthologous protein-coding genes to reconstruct the phylogeny of Ericales, an order characterized by rapid ancient radiation. Seven species representing different families in Ericales were used as in-groups. Both concatenation and coalescence methods yielded the same well-supported topology as previous studies, with only two nodes conflicting with previously reported relationships. The results revealed that a partitioning strategy could improve the traditional concatenation methodology. Rapidly evolving genes negatively affected the concatenation analysis, while slowly evolving genes slightly affected the coalescence analysis. The coalescence methods usually accommodated rate heterogeneity better and required fewer genes to yield well-supported topologies than the concatenation methods with both real and simulated data.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132557257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Genomics of Amphibian-like Ranaviruses, Nucleocytoplasmic Large DNA Viruses of Poikilotherms","authors":"S. J. Price","doi":"10.4137/EBO.S33490","DOIUrl":"https://doi.org/10.4137/EBO.S33490","url":null,"abstract":"Recent research on genome evolution of large DNA viruses has highlighted a number of incredibly dynamic processes that can facilitate rapid adaptation. The genomes of amphibian-like ranaviruses - double-stranded DNA viruses infecting amphibians, reptiles, and fish (family Iridoviridae) - were examined to assess variation in genome content and evolutionary processes. The viruses studied were closely related, but their genome content varied considerably, with 29 genes identified that were not present in all of the major clades. Twenty-one genes had evidence of recombination, while a virus isolated from a captive reptile appeared to be a mosaic of two divergent parents. Positive selection was also found to be acting on more than a quarter of Ranavirus genes and was found most frequently in the Spanish common midwife toad virus, which has had a severe impact on amphibian host communities. Efforts to resolve the root of this group by inclusion of an outgroup were inconclusive, but a set of core genes were identified, which recovered a well-supported species tree.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129515023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Currat, P. Gerbault, D. Di, J. M. Nunes, A. Sanchez‐Mazas
{"title":"Forward-in-Time, Spatially Explicit Modeling Software to Simulate Genetic Lineages Under Selection","authors":"M. Currat, P. Gerbault, D. Di, J. M. Nunes, A. Sanchez‐Mazas","doi":"10.4137/EBO.S33488","DOIUrl":"https://doi.org/10.4137/EBO.S33488","url":null,"abstract":"SELECTOR is a software package for studying the evolution of multiallelic genes under balancing or positive selection while simulating complex evolutionary scenarios that integrate demographic growth and migration in a spatially explicit population framework. Parameters can be varied both in space and time to account for geographical, environmental, and cultural heterogeneity. SELECTOR can be used within an approximate Bayesian computation estimation framework. We first describe the principles of SELECTOR and validate the algorithms by comparing its outputs for simple models with theoretical expectations. Then, we show how it can be used to investigate genetic differentiation of loci under balancing selection in interconnected demes with spatially heterogeneous gene flow. We identify situations in which balancing selection reduces genetic differentiation between population groups compared with neutrality and explain conflicting outcomes observed for human leukocyte antigen loci. These results and three previously published applications demonstrate that SELECTOR is efficient and robust for building insight into human settlement history and evolution.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124457580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins","authors":"Sayaka Miura, Stephanie Tate, Sudhir Kumar","doi":"10.4137/EBO.S30594","DOIUrl":"https://doi.org/10.4137/EBO.S30594","url":null,"abstract":"Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123832305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using uniformat and gene[rate] to Analyze Data with Ambiguities in Population Genetics","authors":"J. M. Nunes","doi":"10.4137/EBO.S32415","DOIUrl":"https://doi.org/10.4137/EBO.S32415","url":null,"abstract":"Some genetic systems frequently present ambiguous data that cannot be straightforwardly analyzed with common methods of population genetics. Two possibilities arise to analyze such data: one is the arbitrary simplification of the data and the other is the development of methods adapted to such ambiguous data. In this article, we present an attempt at such a development, the UNIFORMAT grammar and the GENEE[RATE] tools, highlighting the specific aspects and the adaptations required to analyze ambiguous nominal data in population genetics.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125673889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Lin, Xiaoyong Du, Sixue Peng, Liubin Yang, Yunlong Ma, Y. Gong, Shijun Li
{"title":"Discovering All Transcriptome Single-Nucleotide Polymorphisms and Scanning for Selection Signatures in Ducks (Anas platyrhynchos)","authors":"R. Lin, Xiaoyong Du, Sixue Peng, Liubin Yang, Yunlong Ma, Y. Gong, Shijun Li","doi":"10.4137/EBO.S21545","DOIUrl":"https://doi.org/10.4137/EBO.S21545","url":null,"abstract":"The duck is one of the most economically important waterfowl as a source of meat, eggs, and feathers. Characterizing the genetic variation in duck species is an important step toward linking genes or genomic regions with phenotypes. Human-driven selection during duck domestication and subsequent breed formation has likely left detectable signatures in duck genome. In this study, we employed a panel of >1.4 million single-nucleotide polymorphisms (SNPs) identified from the RNA sequencing (RNA-seq) data of 15 duck individuals. The density of the resulting SNPs is significantly positively correlated with the density of genes across the duck genome, which demonstrates that the usage of the RNA-seq data allowed us to enrich variant functional categories, such as coding exons, untranslated regions (UTRs), introns, and downstream/upstream. We performed a complete scan of selection signatures in the ducks using the composite likelihood ratio (CLR) and found 76 candidate regions of selection, many of which harbor genes related to phenotypes relevant to the function of the digestive system and fat metabolism, including TCF7L2, EIF2AK3, ELOVL2, and fatty acid-binding protein family. This study illustrates the potential of population genetic approaches for identifying genomic regions affecting domestication-related phenotypes and further helps to increase the known genetic information about this economically important animal.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115150925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Itunuoluwa Isewon, Jelili Oyelade, B. Brors, E. Adebiyi
{"title":"In Silico Gene Regulatory Network of the Maurer’s Cleft Pathway in Plasmodium falciparum","authors":"Itunuoluwa Isewon, Jelili Oyelade, B. Brors, E. Adebiyi","doi":"10.4137/EBO.S25585","DOIUrl":"https://doi.org/10.4137/EBO.S25585","url":null,"abstract":"The Maurer's clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite's biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123809222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}