Journal of Computational Biology最新文献_第2页

Rebuttal to Flaws in the Paper 'Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression'. 对“传染病近瞬时时变繁殖数——一种基于非线性回归的直接方法”论文缺陷的反驳。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 DOI: 10.1089/cmb.2025.0024

Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad

引用次数: 0

Special Section: 12th International Computational Advances in Bio and Medical Sciences (ICCABS 2023). 特别部分：第12届国际生物和医学科学计算进展（ICCABS 2023）。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 Epub Date: 2025-05-16 DOI: 10.1089/cmb.2025.0124

Mukul S Bansal, Wei Chen, Yury Khudyakov, Ion I Măndoiu, Marmar R Moussa, Murray Patterson, Sanguthevar Rajasekaran, Pavel Skums, Sharma V Thankachan, Alex Zelikovsky

引用次数: 0

An Exact Matching Method for 16S rRNA Taxonomy Classification. 16S rRNA分类分类的精确匹配方法。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 Epub Date: 2025-06-09 DOI: 10.1089/cmb.2024.0615

Sing-Hoi Sze

引用次数: 0

An Algorithm to Calculate the p-Value of the Monge-Elkan Distance. 一种计算Monge-Elkan距离p值的算法。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 Epub Date: 2025-06-09 DOI: 10.1089/cmb.2024.0854

Petr Ryšavý, Filip Železný

引用次数: 0

A Mapper Algorithm with Implicit Intervals and Its Optimization. 一种隐式区间映射算法及其优化。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 Epub Date: 2025-06-05 DOI: 10.1089/cmb.2024.0919

Yuyang Tao, Shufei Ge

{"title":"A Mapper Algorithm with Implicit Intervals and Its Optimization.","authors":"Yuyang Tao, Shufei Ge","doi":"10.1089/cmb.2024.0919","DOIUrl":"10.1089/cmb.2024.0919","url":null,"abstract":"The Mapper algorithm is an essential tool for visualizing complex, high-dimensional data in topological data analysis and has been widely used in biomedical research. It outputs a combinatorial graph whose structure encodes the shape of the data. However, the need for manual parameter tuning and fixed (implicit) intervals, along with fixed overlapping ratios, may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent (SGD). In this work, we develop a soft Mapper framework based on a Gaussian mixture model for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a SGD algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank successfully identifies a distinct subgroup of Alzheimer's Disease. The implementation of our method is available at https://github.com/FarmerTao/Implicit-interval-Mapper.git.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"781-796"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Data Compression: Recent Innovations in LZ77 Algorithms. 增强数据压缩：LZ77算法的最新创新。

IF 1.6 4区生物学

Journal of Computational Biology Pub Date : 2025-08-01 Epub Date: 2025-05-30 DOI: 10.1089/cmb.2024.0879

Aaron Hong, Christina Boucher

{"title":"Enhancing Data Compression: Recent Innovations in LZ77 Algorithms.","authors":"Aaron Hong, Christina Boucher","doi":"10.1089/cmb.2024.0879","DOIUrl":"10.1089/cmb.2024.0879","url":null,"abstract":"The growing volume of genomic data, driven by advances in sequencing technologies, demands efficient data compression solutions. Traditional algorithms, such as Lempel-Ziv77 (LZ77), have been fundamental in offering lossless compression, yet they often fall short when applied to the highly repetitive structures typical of genomic sequences. This review explores the evolution of LZ77 and its adaptations for genomic data compression, highlighting specialized algorithms designed to handle redundancy in large-scale sequencing datasets efficiently. Innovations in this field have enhanced compression ratios and processing efficiencies leveraging intrinsic redundancy within genomic datasets. We critically examine a spectrum of LZ77-based algorithms, including newer adaptations for external and semi-external memory settings, and contrast their efficacy in managing large-scale genomic data. We conducted experiments to evaluate the performance of several algorithms, including KKP2, RLE-LZ, SE-KKP, BGone, and PFP-LZ77, on both real-world datasets from the Pizza&Chili repetitive corpus, Salmonella genomes, and human chromosome 19 genomes. These results underscore the trade-offs between time and memory consumption between algorithms. This article aims to provide a comprehensive guide on the current landscape and future directions of data compression technologies, equipping bioinformaticians and other practitioners with insight to tackle the escalating data challenges in genomics and beyond.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"761-780"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12409268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144181663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Asymptotic Distribution of the k-Robinson-Foulds Dissimilarity Measure on Labeled Trees. 标记树k-Robinson-Foulds不相似测度的渐近分布。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2025-07-02 DOI: 10.1089/cmb.2025.0093

Michael Fuchs, Mike Steel

{"title":"The Asymptotic Distribution of the k-Robinson-Foulds Dissimilarity Measure on Labeled Trees.","authors":"Michael Fuchs, Mike Steel","doi":"10.1089/cmb.2025.0093","DOIUrl":"https://doi.org/10.1089/cmb.2025.0093","url":null,"abstract":"Motivated by applications in medical bioinformatics, Khayatian et al. (2024) introduced a family of metrics on Cayley trees [the k-Robinson-Foulds (RF) distance, for <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn><mo>,</mo></mrow></math> . . . <math><mrow><mo>,</mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>] and explored their distribution on pairs of random Cayley trees via simulations. In this article, we investigate this distribution mathematically and derive exact asymptotic descriptions of the distribution of the k-RF metric for the extreme values <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn></mrow></math> and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>, as n becomes large. We show that a linear transform of the 0-RF metric converges to a Poisson distribution (with mean 2), whereas a similar transform for the (<math><mrow><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>)-RF metric leads to a normal distribution (with mean <math><mrow><mstyle><mo>∼</mo></mstyle><mo> </mo><mi>n</mi><mrow><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><mn>2</mn></mrow></msup></mrow></mrow></math>). These results (together with the case <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>1</mn></mrow></math> which behaves quite differently and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>3</mn></mrow></math>) shed light on the earlier simulation results and the predictions made concerning them.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DrIVeNN: Drug Interaction Vectors Neural Network. 药物相互作用向量神经网络。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2025-07-01 Epub Date: 2025-05-12 DOI: 10.1089/cmb.2025.0079

Natalie Wang, Casey Overby Taylor

{"title":"DrIVeNN: Drug Interaction Vectors Neural Network.","authors":"Natalie Wang, Casey Overby Taylor","doi":"10.1089/cmb.2025.0079","DOIUrl":"10.1089/cmb.2025.0079","url":null,"abstract":"Polypharmacy, the concurrent use of multiple drugs to treat a single condition, is common in patients managing multiple or complex conditions. However, as more drugs are added to the treatment plan, the risk of adverse drug events (ADEs) rises rapidly. Because it is impractical to test every possible drug combination during clinical trials, many serious polypharmacy ADEs (also known as drug-drug interactions or DDIs) only become known after the drugs are in use. This issue is prevalent among older adults with cardiovascular disease (CVD), where polypharmacy and ADEs are common. In this research, our primary objective was to identify key drug features and build and evaluate a model to predict DDIs. Our secondary objective was to assess our model on a domain-specific case study. We developed a two-layer neural network that incorporated drug features such as molecular structure, drug-protein interactions, and mono-drug side effects (drug interaction vectors neural network [DrIVeNN]) using publicly available side effect databases. It performed moderately better than state-of-the-art models such as DGNN-DDI, KGDDI, and NNPS. DrIVeNN had average area under the Receiver Operating Characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) scores of 0.934 and 0.920, respectively, compared to the best-performing baseline model, DGNN-DDI, which had scores of 0.919 and 0.904. We also conducted a domain-specific case study centered on CVD treatment, and there was a significant increase in performance from the general model. We observed an average AUROC for CVD DDI prediction of 0.979. This research contributes to the advancement of predictive modeling techniques for polypharmacy ADEs and indicates the strong potential of domain-specific models.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"696-706"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12259410/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144027348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge. 用于提取分子相互作用和途径知识的大型语言模型的比较性能评价。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2025-07-01 Epub Date: 2025-05-19 DOI: 10.1089/cmb.2025.0078

Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Shinjae Yoo, Shantenu Jha

{"title":"Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge.","authors":"Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Shinjae Yoo, Shantenu Jha","doi":"10.1089/cmb.2025.0078","DOIUrl":"10.1089/cmb.2025.0078","url":null,"abstract":"Understanding the interactions and regulatory relationships among biomolecules is essential for deciphering complex biological systems and elucidating the mechanisms behind diverse biological functions. Traditionally, the collection of such molecular interaction data has relied on expert curation, a process that is both time-consuming and labor-intensive. To address these limitations, this study explores the use of large language models (LLMs) to automate the genome-scale extraction of molecular interaction knowledge. We evaluate the performance of various LLMs on key biological tasks, including the identification of protein-protein interactions, detection of genes associated with pathways influenced by low-dose radiation, and inference of gene regulatory relationships. Our findings demonstrate that larger LLMs tend to perform better, particularly in extracting intricate gene and protein interactions. Despite their strengths, these models face challenges in recognizing functionally diverse gene groups and highly correlated regulatory relationships. Through a comprehensive analysis using established molecular interaction and pathway databases, we show that LLMs possess the potential to identify relevant biomolecules and predict their interactions, offering valuable insights and marking a significant step toward AI-driven biological knowledge discovery.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"675-695"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144093858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Graphlet-Based Community Detection Algorithm. 一种新的基于graphlet的社区检测算法。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2025-07-01 Epub Date: 2025-07-02 DOI: 10.1089/cmb.2025.0095

Pablo M Redondo, Reza Mousapour, Wayne B Hayes

{"title":"A Novel Graphlet-Based Community Detection Algorithm.","authors":"Pablo M Redondo, Reza Mousapour, Wayne B Hayes","doi":"10.1089/cmb.2025.0095","DOIUrl":"10.1089/cmb.2025.0095","url":null,"abstract":"Community detection is a long-standing problem with applications from social networks to biology. Given its popularity and that it is NP-complete, heuristics abound, though no gold standard exists-there is even disagreement on the technical definition of what constitutes a community of nodes. We define a community as any set of nodes in which the edge density is uniformly higher by a substantial margin than the network's overall edge density. In this article, we introduce an entirely novel algorithm with no relation to any existing algorithm: given an edge density threshold <math><mi>ε</mi></math>, we build a community that has edge density above <math><mi>ε</mi></math> by building them from sampled graphlets-small induced subgraphs-that themselves have density above <math><mi>ε</mi></math>. By conglomerating and merging these graphlets, the community has an edge density that is uniformly above the threshold. We show that this algorithm almost universally outperforms existing algorithms chosen widely across the literature (biological and non-biological) in the problem of overlapping community detection, in that it finds larger and more dense communities in virtually every test case. Finally, we run our code through the 2016 DREAM challenge for community detection in biological networks, and show that it finds substantially more dense communities than the DREAM competition winners.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"707-720"},"PeriodicalIF":1.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0