{"title":"An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions","authors":"Ji-Yong An, Yong Zhou, Yu-Jun Zhao, Zi-Ji Yan","doi":"10.1177/1176934319879920","DOIUrl":"https://doi.org/10.1177/1176934319879920","url":null,"abstract":"Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"663 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Ease of Programming in C++, Go, and Java for Implementing a Next-Generation Sequencing Tool","authors":"Pascal Costanza, Charlotte Herzeel, W. Verachtert","doi":"10.1177/1176934319869015","DOIUrl":"https://doi.org/10.1177/1176934319869015","url":null,"abstract":"elPrep is an extensible multithreaded software framework for efficiently processing Sequence Alignment/Map (SAM)/Binary Alignment/Map (BAM) files in next-generation sequencing pipelines. Similar to other SAM/BAM tools, a key challenge in elPrep is memory management, as such programs need to manipulate large amounts of data. We therefore investigated 3 programming languages with support for assisted or automated memory management for implementing elPrep, namely C++, Go, and Java. We implemented a nontrivial subset of elPrep in all 3 programming languages and compared them by benchmarking their runtime performance and memory use to determine the best language in terms of computational performance. In a previous article, we motivated why, based on these results, we eventually selected Go as our implementation language. In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130806457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing the Missing Heritability Problem With the Help of Regulatory Features","authors":"Shanshan Dong, Yan Guo, Tie-Lin Yang","doi":"10.1177/1176934319860861","DOIUrl":"https://doi.org/10.1177/1176934319860861","url":null,"abstract":"Genome-wide association studies (GWASs) have successfully identified thousands of susceptibility loci for human complex diseases. However, missing heritability is still a challenging problem. Considering most GWAS loci are located in regulatory elements, we recently developed a pipeline named functional disease-associated single-nucleotide polymorphisms (SNPs) prediction (FDSP), to predict novel susceptibility loci for complex diseases based on the interpretation of regulatory features and published GWAS results with machine learning. When applied to type 2 diabetes and hypertension, the predicted susceptibility loci by FDSP were proved to be capable of explaining additional heritability. In addition, potential target genes of the predicted positive SNPs were significantly enriched in disease-related pathways. Our results suggested that taking regulatory features into consideration might be a useful way to address the missing heritability problem. We hope FDSP could offer help for the identification of novel susceptibility loci for complex diseases.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125095249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of Samples With Segregating Polymerase Chain Reaction (PCR) Dropout Mutations Within a Species","authors":"C. Griswold","doi":"10.1177/1176934319883612","DOIUrl":"https://doi.org/10.1177/1176934319883612","url":null,"abstract":"In polymerase chain reaction (PCR)-based DNA sequencing studies, there is the possibility that mutations at the binding sites of primers result in no primer binding and therefore no amplification. In this article, we call such mutations PCR dropouts and present a coalescent-based theory of the distribution of segregating PCR dropout mutations within a species. We show that dropout mutations typically occur along branch sections that are at or near the base of a coalescent tree, if at all. Given that a dropout mutation occurs along a branch section near the base of a tree, there is a good chance that it causes the alleles of a large fraction of a species to go unamplified, which distorts the tree shape. Expected coalescence times and distributions of pairwise sequence differences in the presence of PCR dropout mutations are derived under the assumptions of both neutrality and background selection. These expectations differ from when PCR dropout mutations are absent and may form the basis of inferential approaches to detect the presence of dropout mutations, as well as the development of unbiased estimators of statistics associated with population-level genetic variation.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134428657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applications and Considerations of GToTree: A User-Friendly Workflow for Phylogenomics","authors":"Michael D. Lee","doi":"10.1177/1176934319862245","DOIUrl":"https://doi.org/10.1177/1176934319862245","url":null,"abstract":"Phylogenomics is the practice of attempting to infer evolutionary relationships at a genome-level. This is becoming a standard step in the characterization of newly recovered genomes and to direct/constrain further research; yet the process from start to finish of building a de novo phylogenomic tree that is specific to the organisms of interest can still be computationally intractable for many biologists. GToTree is a recently published user-friendly workflow for phylogenomics intended to give more researchers the capability to generate phylogenomic trees to help guide their work. This commentary describes two common applications where GToTree can be helpful and then discusses some things to consider when using the program.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124399338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome-Wide Comprehensive Analysis of the SABATH Gene Family in Arabidopsis and Rice","authors":"Bin Wang, Min Li, Yijun Yuan, Shaofang Liu","doi":"10.1177/1176934319860864","DOIUrl":"https://doi.org/10.1177/1176934319860864","url":null,"abstract":"Low molecular weight metabolites are important plant hormones and signaling molecules, and play an important part among the processes of plant development. Their activities may also be affected by the chemical modifications of methylation performed by SABATH. In this study, a total of 24 and 21 SABATH genes in Arabidopsis and rice, respectively, were identified and taken a comprehensive study. Phylogenetic analysis showed that AtSABATH and OsSABATH genes could be classified into 4 major groups and 6 subgroups. Gene expansion analysis showed that the main expansion mechanism of SABATH gene family in Arabidopsis and rice was tandem duplication and segmental duplication. The ratios of nonsynonymous (Ka) and synonymous (Ks) substitution rates of 12 pairs paralogous of AtSABATH and OsSABATH genes indicated that the SABATH gene family in Arabidopsis and rice had gone through purifying selection. Positive selection analysis with site models and branch-site models revealed that AtSABATH and OsSABATH genes had undergone selective pressure for adaptive evolution. Motif analysis showed that certain motifs only existed in specific subgroups or species, which indicated that the SABATH proteins of Arabidopsis and rice appear divergence in different species and subgroups. Functional divergence analysis also suggested that the AtSABATH and OsSABATH subgroup genes had functional differences, and the positive selection sites which contributed to functional divergence among subgroups were detected. These results provide insights into functional conservation and diversification of SABATH gene family, and are useful information for further elucidating SABATH gene family functions.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Van Nguyen, T. Nguyen, Abu Saleh Md. Tayeen, H. D. Laughinghouse, L. Sánchez-Reyes, Enrico Pontelli, D. Mozzherin, B. O’Meara, A. Stoltzfus
{"title":"Phylotastic: Improving Access to Tree-of-Life Knowledge With Flexible, on-the-Fly Delivery of Trees","authors":"Van Nguyen, T. Nguyen, Abu Saleh Md. Tayeen, H. D. Laughinghouse, L. Sánchez-Reyes, Enrico Pontelli, D. Mozzherin, B. O’Meara, A. Stoltzfus","doi":"10.1101/419143","DOIUrl":"https://doi.org/10.1101/419143","url":null,"abstract":"A comprehensive phylogeny of species, i.e., a tree of life, has potential uses in a variety of contexts, including research, education, and public policy. Yet, accessing the tree of life typically requires special knowledge, complex software, or long periods of training. The Phylotastic project aims make it as easy to get a phylogeny of species as it is to get driving directions from mapping software. In prior work, we presented a design for an open system to validate and manage taxon names, find phylogeny resources, extract subtrees matching a user’s taxon list, scale trees to time, and integrate related resources such as species images. Here, we report the implementation of a set of tools that together represent a robust, accessible system for on-the-fly delivery of phylogenetic knowledge. This set of tools includes a web portal to execute several customizable workflows to obtain species phylogenies (scaled by geologic time and decorated with thumbnail images); more than 30 underlying web services (accessible via a common registry); and code toolkits in R and Python (allowing others to develop custom applications using Phylotastic services). The Phylotastic system, accessible via http://www.phylotastic.org, provides a unique resource to access the current state of phylogenetic knowledge, useful for a variety of cases in which a tree extracted quickly from online resources (as distinct from a tree custom-made from character data) is sufficient, as it is for many casual uses of trees identified here.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127510310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Phylogenetic and Network Approaches to Study Gene Family Evolution: The Case of the AGAMOUS Family of Floral Genes","authors":"D. Carvalho, James c. Schnable, Ana Almeida","doi":"10.1101/195669","DOIUrl":"https://doi.org/10.1101/195669","url":null,"abstract":"The study of gene family evolution has benefited from the use of phylogenetic tools, which can greatly inform studies of both relationships within gene families and functional divergence. Here, we propose the use of a network-based approach that in combination with phylogenetic methods can provide additional support for models of gene family evolution. We dissect the contributions of each method to the improved understanding of relationships and functions within the well-characterized family of AGAMOUS floral development genes. The results obtained with the two methods largely agreed with one another. In particular, we show how network approaches can provide improved interpretations of branches with low support in a conventional gene tree. The network approach used here may also better reflect known and suspected patterns of functional divergence relative to phylogenetic methods. Overall, we believe that the combined use of phylogenetic and network tools provide a more robust assessment of gene family evolution.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"467 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124503027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virus Discovery Using Tick Cell Lines","authors":"L. Bell-Sakyi, H. Attoui","doi":"10.4137/EBO.S39675","DOIUrl":"https://doi.org/10.4137/EBO.S39675","url":null,"abstract":"While ticks have been known to harbor and transmit pathogenic arboviruses for over 80 years, the application of high-throughput sequencing technologies has revealed that ticks also appear to harbor a diverse range of endogenous tick-only viruses belonging to many different families. Almost nothing is known about these viruses; indeed, it is unclear in most cases whether the identified viral sequences are derived from actual replication-competent viruses or from endogenous virus elements incorporated into the ticks’ genomes. Tick cell lines play an important role in virus discovery and isolation through the identification of novel viruses chronically infecting such cell lines and by acting as host cells to aid in determining whether or not an entire replication-competent, infective virus is present in a sample. Here, we review recent progress in tick-borne virus discovery and comment on the actual and potential applications for tick cell lines in this emerging research area.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127056045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Archival Collections are Important in the Study of the Biology, Diversity, and Evolution of Arboviruses","authors":"A. Pyke, D. Warrilow","doi":"10.4137/EBO.S40569","DOIUrl":"https://doi.org/10.4137/EBO.S40569","url":null,"abstract":"Historically, classifications of arboviruses were based on serological techniques. Hence, collections of arbovirus isolates have been central to this process by providing the antigenic reagents for these methods. However, with increasing concern about biosafety and security, the introduction of molecular biology techniques has led to greater emphasis on the storage of nucleic acid sequence data over the maintenance of archival material. In this commentary, we provide examples of where archival collections provide an important source of genetic material to assist in confirming the authenticity of reference strains and vaccine stocks, to clarify taxonomic relationships particularly when isolates of the same virus species have been collected across a wide expanse of time and space, for future phenotypic analysis, to determine the historical diversity of strains, and to understand the mechanisms leading to changes in genome structure and virus evolution.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116135238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}