Ming Cao, Qinke Peng, Ze-Gang Wei, Fei Liu, Yi-Fan Hou
{"title":"EdClust: A heuristic sequence clustering method with higher sensitivity.","authors":"Ming Cao, Qinke Peng, Ze-Gang Wei, Fei Liu, Yi-Fan Hou","doi":"10.1142/S0219720021500360","DOIUrl":"https://doi.org/10.1142/S0219720021500360","url":null,"abstract":"<p><p>The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150036"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39751492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bioinformatics and Computational Biology: A Primer for Biologists","authors":"B. Tiwary","doi":"10.1007/978-981-16-4241-8","DOIUrl":"https://doi.org/10.1007/978-981-16-4241-8","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"61 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83849236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
{"title":"A Novel Method for Predicting DNA N4-Methylcytosine Sites Based on Deep Forest Algorithm","authors":"Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo","doi":"10.2139/ssrn.4062895","DOIUrl":"https://doi.org/10.2139/ssrn.4062895","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68686715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue of the 18th Annual International RECOMB Satellite Workshop on Comparative Genomics.","authors":"Rohan B H Williams, Louxin Zhang","doi":"10.1142/S0219720021020030","DOIUrl":"https://doi.org/10.1142/S0219720021020030","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2102003"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small parsimony for natural genomes in the DCJ-indel model.","authors":"Daniel Doerr, Cedric Chauve","doi":"10.1142/S0219720021400096","DOIUrl":"https://doi.org/10.1142/S0219720021400096","url":null,"abstract":"<p><p>The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140009"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A symmetry-inclusive algebraic approach to genome rearrangement.","authors":"Venta Terauds, Joshua Stevenson, Jeremy Sumner","doi":"10.1142/S0219720021400151","DOIUrl":"https://doi.org/10.1142/S0219720021400151","url":null,"abstract":"<p><p>Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The \"position paradigm\", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of <i>genome algebras</i>, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with \"practical\" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140015"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff
{"title":"The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny.","authors":"Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff","doi":"10.1142/S0219720021400084","DOIUrl":"https://doi.org/10.1142/S0219720021400084","url":null,"abstract":"<p><p>Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140008"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias
{"title":"Incorporating intergenic regions into reversal and transposition distances with indels.","authors":"Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias","doi":"10.1142/S0219720021400114","DOIUrl":"https://doi.org/10.1142/S0219720021400114","url":null,"abstract":"<p><p>Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140011"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39889211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost.","authors":"Firda Aminy Maruf, Rian Pratama, Giltae Song","doi":"10.1142/S0219720021400175","DOIUrl":"https://doi.org/10.1142/S0219720021400175","url":null,"abstract":"<p><p>Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140017"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39716735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing the topology of phylogenetic network generators.","authors":"Remie Janssen, Pengyu Liu","doi":"10.1142/S0219720021400126","DOIUrl":"https://doi.org/10.1142/S0219720021400126","url":null,"abstract":"<p><p>Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140012"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}