{"title":"Mining sponge phenomena in RNA expression data.","authors":"Fabrizio Angiulli, Teresa Colombo, Fabio Fassetti, Angelo Furfaro, Paola Paci","doi":"10.1142/S0219720021500220","DOIUrl":"https://doi.org/10.1142/S0219720021500220","url":null,"abstract":"<p><p>In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway \"microRNAs in cancer\" when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39636898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Zhu, Shuwan Yin, Jia Zheng, Yixia Shi, Cangzhi Jia
{"title":"O-glycosylation site prediction for <i>Homo sapiens</i> by combining properties and sequence features with support vector machine.","authors":"Yan Zhu, Shuwan Yin, Jia Zheng, Yixia Shi, Cangzhi Jia","doi":"10.1142/S0219720021500293","DOIUrl":"https://doi.org/10.1142/S0219720021500293","url":null,"abstract":"<p><p>O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on <i>Homo sapiens</i> and develop an O-glycosylation predictor for <i>Homo sapiens</i>, named <b>Captor</b>, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, <b>Captor</b> outperformed the existing O-glycosylation tool, suggesting that <b>Captor</b> could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Amino acid environment affinity model based on graph attention network.","authors":"Xueheng Tong, Shuqi Liu, Jiawei Gu, Chunguo Wu, Yanchun Liang, Xiaohu Shi","doi":"10.1142/S0219720021500323","DOIUrl":"https://doi.org/10.1142/S0219720021500323","url":null,"abstract":"<p><p>Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39875262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Cao, Qinke Peng, Ze-Gang Wei, Fei Liu, Yi-Fan Hou
{"title":"EdClust: A heuristic sequence clustering method with higher sensitivity.","authors":"Ming Cao, Qinke Peng, Ze-Gang Wei, Fei Liu, Yi-Fan Hou","doi":"10.1142/S0219720021500360","DOIUrl":"https://doi.org/10.1142/S0219720021500360","url":null,"abstract":"<p><p>The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39751492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bioinformatics and Computational Biology: A Primer for Biologists","authors":"B. Tiwary","doi":"10.1007/978-981-16-4241-8","DOIUrl":"https://doi.org/10.1007/978-981-16-4241-8","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83849236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
{"title":"A Novel Method for Predicting DNA N4-Methylcytosine Sites Based on Deep Forest Algorithm","authors":"Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo","doi":"10.2139/ssrn.4062895","DOIUrl":"https://doi.org/10.2139/ssrn.4062895","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68686715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue of the 18th Annual International RECOMB Satellite Workshop on Comparative Genomics.","authors":"Rohan B H Williams, Louxin Zhang","doi":"10.1142/S0219720021020030","DOIUrl":"https://doi.org/10.1142/S0219720021020030","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small parsimony for natural genomes in the DCJ-indel model.","authors":"Daniel Doerr, Cedric Chauve","doi":"10.1142/S0219720021400096","DOIUrl":"https://doi.org/10.1142/S0219720021400096","url":null,"abstract":"<p><p>The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A symmetry-inclusive algebraic approach to genome rearrangement.","authors":"Venta Terauds, Joshua Stevenson, Jeremy Sumner","doi":"10.1142/S0219720021400151","DOIUrl":"https://doi.org/10.1142/S0219720021400151","url":null,"abstract":"<p><p>Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The \"position paradigm\", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of <i>genome algebras</i>, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with \"practical\" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff
{"title":"The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny.","authors":"Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff","doi":"10.1142/S0219720021400084","DOIUrl":"https://doi.org/10.1142/S0219720021400084","url":null,"abstract":"<p><p>Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}