{"title":"Quantitative Modeling of Stemness in Single-Cell RNA Sequencing Data: A Nonlinear One-Class Support Vector Machine Method.","authors":"Hao Jiang, Jingxin Liu, You Song, Jinzhi Lei","doi":"10.1089/cmb.2022.0484","DOIUrl":"10.1089/cmb.2022.0484","url":null,"abstract":"<p><p>Intratumoral heterogeneity and the presence of cancer stem cells are challenging issues in cancer therapy. An appropriate quantification of the stemness of individual cells for assessing the potential for self-renewal and differentiation from the cell of origin can define a measurement for quantifying different cell states, which is important in understanding the dynamics of cancer evolution, and might further provide possible targeted therapies aimed at tumor stem cells. Nevertheless, it is usually difficult to quantify the stemness of a cell based on molecular information associated with the cell. In this study, we proposed a stemness definition method with one-class Hadamard kernel support vector machine (OCHSVM) based on single-cell RNA sequencing (scRNA-seq) data. Applications of the proposed OCHSVM stemness are assessed by various data sets, including preimplantation embryo cells, induced pluripotent stem cells, or tumor cells. We further compared the OCHSVM model with state-of-the-art methods CytoTRACE, one-class logistic regression, or one-class SVM methods with different kernels. The computational results demonstrate that the OCHSVM method is more suitable for stemness identification using scRNA-seq data.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"41-57"},"PeriodicalIF":1.7,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Highly Accurate Model for Screening Prostate Cancer Using Propensity Index Panel of Ten Genes.","authors":"Shipra Jain, Kawal Preet Kaur Malhotra, Sumeet Patiyal, Gajendra Pal Singh Raghava","doi":"10.1089/cmb.2023.0040","DOIUrl":"10.1089/cmb.2023.0040","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1305-1314"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71424082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Creating and Using Minimizer Sketches in Computational Genomics.","authors":"Hongyu Zheng, Guillaume Marçais, Carl Kingsford","doi":"10.1089/cmb.2023.0094","DOIUrl":"10.1089/cmb.2023.0094","url":null,"abstract":"<p><p>Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1251-1276"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10113000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Su, Jujuan Zhuang, Shuhan Liu, Di Liu, Kexin Feng
{"title":"EnILs: A General Ensemble Computational Approach for Predicting Inducing Peptides of Multiple Interleukins.","authors":"Rui Su, Jujuan Zhuang, Shuhan Liu, Di Liu, Kexin Feng","doi":"10.1089/cmb.2023.0002","DOIUrl":"10.1089/cmb.2023.0002","url":null,"abstract":"<p><p>Interleukins (ILs) are a group of multifunctional cytokines, which play important roles in immune regulations and inflammatory responses. Recently, IL-6 has been found to affect the development of COVID-19, and significantly elevated levels of IL-6 cytokines have been reported in patients with severe COVID-19. IL-10 and IL-17 are anti-inflammatory and proinflammatory cytokines, respectively, which play multiple protective roles in host defense against pathogens. At present, a number of machine learning methods have been proposed to predict ILs inducing peptides, but their predictive performance needs to be further improved, and the inducing peptides of different ILs are predicted separately, rather than using a general approach. In our work, we combine the statistical features of peptide sequence with word embedding to design a general ensemble model named EnILs to predict inducing peptides of different ILs, in which the predictive probabilities of random forest, eXtreme Gradient Boosting and neural network are integrated in an average way. Compared with the state-of-the-art machine learning methods, EnILs shows considerable performance in the prediction of IL-6, IL-10, and IL-17 inducing peptides. In addition, we predict the most promising IL-6 inducing peptides in Severe Acute Respiratory Syndrome Coronavirus 2 spike protein in the case study for further experimental verification.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1289-1304"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luiz A G Silva, Luis A B Kowada, Maria E M T Walter
{"title":"A Barrier for Further Approximating Sorting by Transpositions.","authors":"Luiz A G Silva, Luis A B Kowada, Maria E M T Walter","doi":"10.1089/cmb.2023.0138","DOIUrl":"10.1089/cmb.2023.0138","url":null,"abstract":"<p><p>The transposition distance problem is a classical problem in genome rearrangements, which seeks to determine the minimum number of transpositions needed to transform a linear chromosome into another represented by the permutations <math><mstyle><mi>π</mi></mstyle></math> and <math><mstyle><mi>σ</mi></mstyle></math>, respectively. This article focuses on the equivalent problem of sorting by transpositions (SBT), where <math><mstyle><mi>σ</mi></mstyle></math> is the identity permutation <math><mstyle><mi>ι</mi></mstyle></math>. Specifically, we investigate palisades, a family of permutations that are \"hard\" to sort, as they require numerous transpositions above the celebrated lower bound devised by Bafna and Pevzner. By determining the transposition distance of palisades, we were able to provide the exact transposition diameter for 3-permutations (TD3), a special subset of the symmetric group <i>S<sub>n</sub></i>, essential for the study of approximate solutions for SBT using the simplification technique. The exact value for TD3 has remained unknown since Elias and Hartman showed an upper bound for it. Another consequence of determining the transposition distance of palisades is that, using as lower bound the one by Bafna and Pevzner, it is impossible to guarantee approximation ratios lower than 1.375 when approximating SBT. This finding has significant implications for the study of SBT, as this problem has been the subject of intense research efforts for the past 25 years.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1277-1288"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54229271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Pawel Jastrzebski, Stefano Pascarella, Aleksandra Lipka, Slawomir Dorocki
{"title":"IncRna: The R Package for Optimizing lncRNA Identification Processes.","authors":"Jan Pawel Jastrzebski, Stefano Pascarella, Aleksandra Lipka, Slawomir Dorocki","doi":"10.1089/cmb.2023.0091","DOIUrl":"10.1089/cmb.2023.0091","url":null,"abstract":"<p><p>In silico identification of long noncoding RNAs (lncRNAs) is a multistage process including filtering of transcripts according to their physical characteristics (e.g., length, exon-intron structure) and determination of the coding potential of the sequence. A common issue within this process is the choice of the most suitable method of coding potential analysis for the conducted research. Selection of tools on the sole basis of their single performance may not provide the most effective choice for a specific problem. To overcome these limitations, we developed the R library lncRna, which provides functions to easily carry out the entire lncRNA identification process. For example, the package prepares the data files for coding potential analysis to perform error analysis. Moreover, the package gives the opportunity to analyze the effectiveness of various combinations of the lncRNA prediction methods to select the optimal configuration of the entire process.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1322-1326"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50158081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gladys M Cavero Rozas, Jose M Cisneros Mandujano, Yomali A Ferreyra Chombo, Daniela V Moreno Rencoret, Yerko M Ortiz Mora, Martín E Gutiérrez Pescarmona, Alberto J Donayre Torres
{"title":"pyBrick-DNA: A Python-Based Environment for Automated Genetic Component Assembly.","authors":"Gladys M Cavero Rozas, Jose M Cisneros Mandujano, Yomali A Ferreyra Chombo, Daniela V Moreno Rencoret, Yerko M Ortiz Mora, Martín E Gutiérrez Pescarmona, Alberto J Donayre Torres","doi":"10.1089/cmb.2023.0008","DOIUrl":"10.1089/cmb.2023.0008","url":null,"abstract":"<p><p>Genetic component assembly is key in the simulation and implementation of genetic circuits. Automating this process, thus accelerating prototyping, is a necessity. We present pyBrick-DNA, a software written in Python, that assembles components for the construction of genetic circuits. pyBrick-DNA (colab.pyBrick.com) is a user-friendly environment where scientists can select genetic sequences or input custom sequences to build genetic assemblies. All components are modularly fused to generate a ready-to-go single DNA fragment. It includes Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and plant gene-editing components. Hence, pyBrick-DNA can generate a functional CRISPR construct composed of a single-guided RNA integrated with Cas9, promoters, and terminator elements. The outcome is a DNA sequence, along with a graphical representation, composed of user-selected genetic parts, ready to be synthesized and cloned in vivo. Moreover, the sequence can be exported as a GenBank file allowing its use with other synthetic biology tools.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1315-1321"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Melİh Barsbey, Riza ÖZçelİk, Alperen Bağ, Berk Atil, Arzucan ÖZgür, Elif Ozkirimli
{"title":"A Computational Software for Training Robust Drug-Target Affinity Prediction Models: pydebiaseddta.","authors":"Melİh Barsbey, Riza ÖZçelİk, Alperen Bağ, Berk Atil, Arzucan ÖZgür, Elif Ozkirimli","doi":"10.1089/cmb.2023.0194","DOIUrl":"10.1089/cmb.2023.0194","url":null,"abstract":"<p><p>\u0000 <b>Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"30 11","pages":"1240-1245"},"PeriodicalIF":1.7,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138291125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riza ÖZçelİk, Alperen Bağ, Berk Atil, Melİh Barsbey, Arzucan ÖZgür, Elif Ozkirimli
{"title":"A Framework for Improving the Generalizability of Drug-Target Affinity Prediction Models.","authors":"Riza ÖZçelİk, Alperen Bağ, Berk Atil, Melİh Barsbey, Arzucan ÖZgür, Elif Ozkirimli","doi":"10.1089/cmb.2023.0208","DOIUrl":"10.1089/cmb.2023.0208","url":null,"abstract":"<p><p>\u0000 <b>Statistical models that accurately predict the binding affinity of an input ligand-protein pair can greatly accelerate drug discovery. Such models are trained on available ligand-protein interaction data sets, which may contain biases that lead the predictor models to learn data set-specific, spurious patterns instead of generalizable relationships. This leads the prediction performances of these models to drop dramatically for previously unseen biomolecules. Various approaches that aim to improve model generalizability either have limited applicability or introduce the risk of degrading overall prediction performance. In this article, we present DebiasedDTA, a novel training framework for drug-target affinity (DTA) prediction models that addresses data set biases to improve the generalizability of such models. DebiasedDTA relies on reweighting the training samples to achieve robust generalization, and is thus applicable to most DTA prediction models. Extensive experiments with different biomolecule representations, model architectures, and data sets demonstrate that DebiasedDTA achieves improved generalizability in predicting drug-target affinities.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"30 11","pages":"1226-1239"},"PeriodicalIF":1.7,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138291126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent.","authors":"Yasamin Tabatabaee, Sebastien Roch, Tandy Warnow","doi":"10.1089/cmb.2023.0185","DOIUrl":"10.1089/cmb.2023.0185","url":null,"abstract":"<p><p>\u0000 <b>We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on \"short quintets\" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1146-1181"},"PeriodicalIF":1.7,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71412464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}