Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad
{"title":"Rebuttal to Flaws in the Paper 'Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression'.","authors":"Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad","doi":"10.1089/cmb.2025.0024","DOIUrl":"https://doi.org/10.1089/cmb.2025.0024","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 8","pages":"819-823"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mukul S Bansal, Wei Chen, Yury Khudyakov, Ion I Măndoiu, Marmar R Moussa, Murray Patterson, Sanguthevar Rajasekaran, Pavel Skums, Sharma V Thankachan, Alex Zelikovsky
{"title":"<i>Special Section:</i> 12th International Computational Advances in Bio and Medical Sciences (ICCABS 2023).","authors":"Mukul S Bansal, Wei Chen, Yury Khudyakov, Ion I Măndoiu, Marmar R Moussa, Murray Patterson, Sanguthevar Rajasekaran, Pavel Skums, Sharma V Thankachan, Alex Zelikovsky","doi":"10.1089/cmb.2025.0124","DOIUrl":"10.1089/cmb.2025.0124","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"721-722"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144078209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Exact Matching Method for 16S rRNA Taxonomy Classification.","authors":"Sing-Hoi Sze","doi":"10.1089/cmb.2024.0615","DOIUrl":"10.1089/cmb.2024.0615","url":null,"abstract":"<p><p>One popular approach to taxonomy classification in the microbiome utilizes 16S ribosomal RNA sequences. The main challenge is that 16S rRNA sequences could be almost identical in closely related species, and it is difficult to distinguish them at the species level. Recent approaches are able to achieve almost single nucleotide resolution by constructing an error model of the reads. We develop an exact matching algorithm to utilize the single nucleotide resolution directly. We show that our algorithm is able to obtain improved accuracy in recent samples of mock communities and in samples of high compositional complexity when compared to existing algorithms. A software program implementing this algorithm is available at http://faculty.cse.tamu.edu/shsze/kmpmatch.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"753-760"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Algorithm to Calculate the <i>p</i>-Value of the Monge-Elkan Distance.","authors":"Petr Ryšavý, Filip Železný","doi":"10.1089/cmb.2024.0854","DOIUrl":"10.1089/cmb.2024.0854","url":null,"abstract":"<p><p>The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the <i>p</i>-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the <i>p</i>-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"797-812"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mapper Algorithm with Implicit Intervals and Its Optimization.","authors":"Yuyang Tao, Shufei Ge","doi":"10.1089/cmb.2024.0919","DOIUrl":"10.1089/cmb.2024.0919","url":null,"abstract":"<p><p>The Mapper algorithm is an essential tool for visualizing complex, high-dimensional data in topological data analysis and has been widely used in biomedical research. It outputs a combinatorial graph whose structure encodes the shape of the data. However, the need for manual parameter tuning and fixed (implicit) intervals, along with fixed overlapping ratios, may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent (SGD). In this work, we develop a soft Mapper framework based on a Gaussian mixture model for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a SGD algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank successfully identifies a distinct subgroup of Alzheimer's Disease. The implementation of our method is available at https://github.com/FarmerTao/Implicit-interval-Mapper.git.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"781-796"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Data Compression: Recent Innovations in LZ77 Algorithms.","authors":"Aaron Hong, Christina Boucher","doi":"10.1089/cmb.2024.0879","DOIUrl":"10.1089/cmb.2024.0879","url":null,"abstract":"<p><p>The growing volume of genomic data, driven by advances in sequencing technologies, demands efficient data compression solutions. Traditional algorithms, such as Lempel-Ziv77 (LZ77), have been fundamental in offering lossless compression, yet they often fall short when applied to the highly repetitive structures typical of genomic sequences. This review explores the evolution of LZ77 and its adaptations for genomic data compression, highlighting specialized algorithms designed to handle redundancy in large-scale sequencing datasets efficiently. Innovations in this field have enhanced compression ratios and processing efficiencies leveraging intrinsic redundancy within genomic datasets. We critically examine a spectrum of LZ77-based algorithms, including newer adaptations for external and semi-external memory settings, and contrast their efficacy in managing large-scale genomic data. We conducted experiments to evaluate the performance of several algorithms, including KKP2, RLE-LZ, SE-KKP, BGone, and PFP-LZ77, on both real-world datasets from the Pizza&Chili repetitive corpus, Salmonella genomes, and human chromosome 19 genomes. These results underscore the trade-offs between time and memory consumption between algorithms. This article aims to provide a comprehensive guide on the current landscape and future directions of data compression technologies, equipping bioinformaticians and other practitioners with insight to tackle the escalating data challenges in genomics and beyond.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"761-780"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12409268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144181663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Asymptotic Distribution of the <i>k</i>-Robinson-Foulds Dissimilarity Measure on Labeled Trees.","authors":"Michael Fuchs, Mike Steel","doi":"10.1089/cmb.2025.0093","DOIUrl":"https://doi.org/10.1089/cmb.2025.0093","url":null,"abstract":"<p><p>Motivated by applications in medical bioinformatics, Khayatian et al. (2024) introduced a family of metrics on Cayley trees [the <i>k</i>-Robinson-Foulds (RF) distance, for <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn><mo>,</mo></mrow></math> . . . <math><mrow><mo>,</mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>] and explored their distribution on pairs of random Cayley trees via simulations. In this article, we investigate this distribution mathematically and derive exact asymptotic descriptions of the distribution of the <i>k</i>-RF metric for the extreme values <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn></mrow></math> and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>, as <i>n</i> becomes large. We show that a linear transform of the 0-RF metric converges to a Poisson distribution (with mean 2), whereas a similar transform for the (<math><mrow><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>)-RF metric leads to a normal distribution (with mean <math><mrow><mstyle><mo>∼</mo></mstyle><mo> </mo><mi>n</mi><mrow><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><mn>2</mn></mrow></msup></mrow></mrow></math>). These results (together with the case <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>1</mn></mrow></math> which behaves quite differently and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>3</mn></mrow></math>) shed light on the earlier simulation results and the predictions made concerning them.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DrIVeNN: Drug Interaction Vectors Neural Network.","authors":"Natalie Wang, Casey Overby Taylor","doi":"10.1089/cmb.2025.0079","DOIUrl":"10.1089/cmb.2025.0079","url":null,"abstract":"<p><p>Polypharmacy, the concurrent use of multiple drugs to treat a single condition, is common in patients managing multiple or complex conditions. However, as more drugs are added to the treatment plan, the risk of adverse drug events (ADEs) rises rapidly. Because it is impractical to test every possible drug combination during clinical trials, many serious polypharmacy ADEs (also known as drug-drug interactions or DDIs) only become known after the drugs are in use. This issue is prevalent among older adults with cardiovascular disease (CVD), where polypharmacy and ADEs are common. In this research, our primary objective was to identify key drug features and build and evaluate a model to predict DDIs. Our secondary objective was to assess our model on a domain-specific case study. We developed a two-layer neural network that incorporated drug features such as molecular structure, drug-protein interactions, and mono-drug side effects (drug interaction vectors neural network [DrIVeNN]) using publicly available side effect databases. It performed moderately better than state-of-the-art models such as DGNN-DDI, KGDDI, and NNPS. DrIVeNN had average area under the Receiver Operating Characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) scores of 0.934 and 0.920, respectively, compared to the best-performing baseline model, DGNN-DDI, which had scores of 0.919 and 0.904. We also conducted a domain-specific case study centered on CVD treatment, and there was a significant increase in performance from the general model. We observed an average AUROC for CVD DDI prediction of 0.979. This research contributes to the advancement of predictive modeling techniques for polypharmacy ADEs and indicates the strong potential of domain-specific models.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"696-706"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12259410/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144027348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge.","authors":"Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Shinjae Yoo, Shantenu Jha","doi":"10.1089/cmb.2025.0078","DOIUrl":"10.1089/cmb.2025.0078","url":null,"abstract":"<p><p>Understanding the interactions and regulatory relationships among biomolecules is essential for deciphering complex biological systems and elucidating the mechanisms behind diverse biological functions. Traditionally, the collection of such molecular interaction data has relied on expert curation, a process that is both time-consuming and labor-intensive. To address these limitations, this study explores the use of large language models (LLMs) to automate the genome-scale extraction of molecular interaction knowledge. We evaluate the performance of various LLMs on key biological tasks, including the identification of protein-protein interactions, detection of genes associated with pathways influenced by low-dose radiation, and inference of gene regulatory relationships. Our findings demonstrate that larger LLMs tend to perform better, particularly in extracting intricate gene and protein interactions. Despite their strengths, these models face challenges in recognizing functionally diverse gene groups and highly correlated regulatory relationships. Through a comprehensive analysis using established molecular interaction and pathway databases, we show that LLMs possess the potential to identify relevant biomolecules and predict their interactions, offering valuable insights and marking a significant step toward AI-driven biological knowledge discovery.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"675-695"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144093858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Graphlet-Based Community Detection Algorithm.","authors":"Pablo M Redondo, Reza Mousapour, Wayne B Hayes","doi":"10.1089/cmb.2025.0095","DOIUrl":"10.1089/cmb.2025.0095","url":null,"abstract":"<p><p>Community detection is a long-standing problem with applications from social networks to biology. Given its popularity and that it is NP-complete, heuristics abound, though no gold standard exists-there is even disagreement on the technical definition of what <i>constitutes</i> a community of nodes. We define a <i>community</i> as any set of nodes in which the edge density is <i>uniformly</i> higher by a substantial margin than the network's overall edge density. In this article, we introduce an entirely novel algorithm with no relation to any existing algorithm: given an edge density threshold <math><mi>ε</mi></math>, we build a community that has edge density above <math><mi>ε</mi></math> by building them from sampled <i>graphlets</i>-small induced subgraphs-that themselves have density above <math><mi>ε</mi></math>. By conglomerating and merging these graphlets, the community has an edge density that is <i>uniformly</i> above the threshold. We show that this algorithm almost universally outperforms existing algorithms chosen widely across the literature (biological and non-biological) in the problem of overlapping community detection, in that it finds larger and more dense communities in virtually every test case. Finally, we run our code through the 2016 DREAM challenge for community detection in biological networks, and show that it finds substantially more dense communities than the DREAM competition winners.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"707-720"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}