Journal of Computational Biology最新文献_第9页

Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes. 无损近似模式匹配：高效搜索方案的自动设计

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-30 DOI: 10.1089/cmb.2024.0664

Luca Renders, Lore Depuydt, Sven Rahmann, Jan Fostier

{"title":"Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes.","authors":"Luca Renders, Lore Depuydt, Sven Rahmann, Jan Fostier","doi":"10.1089/cmb.2024.0664","DOIUrl":"10.1089/cmb.2024.0664","url":null,"abstract":"This study introduces a pioneering approach to automate the creation of search schemes for lossless approximate pattern matching. Search schemes are combinatorial structures that define a series of searches over a partitioned pattern. Each search specifies the processing order of these parts and the cumulative lower and upper bounds on the number of errors in each part of the pattern. Together, these searches ensure the identification of all approximate occurrences of a search pattern within a predefined limit of k errors. While existing literature offers designed schemes for up to k = 4 errors, designing search schemes for larger k values incurs escalating computational costs. Our method integrates a greedy algorithm and a novel Integer Linear Programming (ILP) formulation to design efficient search schemes for up to k = 7 errors. Comparative analyses demonstrate the superiority of our ILP-optimal schemes over alternative strategies in both theoretical and practical contexts. Additionally, we propose a dynamic scheme selection technique tailored to specific search patterns, further enhancing efficiency. Combined, this yields runtime reductions of up to 53% for higher k values. To facilitate search scheme generation, we present Hato, an open-source software tool (AGPL-3.0 license) employing the greedy algorithm and utilizing CPLEX for ILP solving. Furthermore, we introduce Columba 1.2, an open-source lossless read-mapper (AGPL-3.0 license) implemented in C++. Columba surpasses existing state-of-the-art tools by identifying all approximate occurrences of 100,000 Illumina reads (150 bp) in the human reference genome within 24 seconds (maximum edit distance of 4) and 75 seconds (maximum edit distance of 6) using a single CPU core. Notably, our study showcases Columba's capability to align 100,000 reads of length 50, with high error rates and up to an edit distance of 7, in a mere 2 hours and 15 minutes. This achievement is unmatched by other lossless aligners, which require over 3 hours for edit distance 5 alignments. Moreover, Columba exhibits a mapping rate four times higher than that of a lossy tool for this dataset.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"975-989"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements. 带重复、缺失、部分标记和不确定测量的环形公路和高速公路问题的近似和精确优化算法。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-10 DOI: 10.1089/cmb.2024.0661

C S Elder, Minh Hoang, Mohsen Ferdosi, Carl Kingsford

{"title":"Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.","authors":"C S Elder, Minh Hoang, Mohsen Ferdosi, Carl Kingsford","doi":"10.1089/cmb.2024.0661","DOIUrl":"10.1089/cmb.2024.0661","url":null,"abstract":"The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"908-926"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Protocol for Designing De Novo Noncanonical Peptide Binders in OSPREY. 在 OSPREY 中设计新的非简约肽结合剂的方案。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-04 DOI: 10.1089/cmb.2024.0669

Henry Childs, Nathan Guerin, Pei Zhou, Bruce R Donald

引用次数: 0

Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs^. 模式在哪里？彩色德布鲁因图的重复感知压缩。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-09 DOI: 10.1089/cmb.2024.0714

Alessio Campanelli, Giulio Ermanno Pibiri, Jason Fan, Rob Patro

{"title":"Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs.","authors":"Alessio Campanelli, Giulio Ermanno Pibiri, Jason Fan, Rob Patro","doi":"10.1089/cmb.2024.0714","DOIUrl":"10.1089/cmb.2024.0714","url":null,"abstract":"We describe lossless compressed data structures for the colored de Bruijn graph (or c-dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a map from k-mers to their color sets. The color set of a k-mer is the set of all identifiers, or colors, of the references that contain the k-mer. While these maps find countless applications in computational biology (e.g., basic query, reading mapping, abundance estimation, etc.), their memory usage represents a serious challenge for large-scale sequence indexing. Our solutions leverage on the intrinsic repetitiveness of the color sets when indexing large collections of related genomes. Hence, the described algorithms factorize the color sets into patterns that repeat across the entire collection and represent these patterns once instead of redundantly replicating their representation as would happen if the sets were encoded as atomic lists of integers. Experimental results across a range of datasets and query workloads show that these representations substantially improve over the space effectiveness of the best previous solutions (sometimes, even dramatically, yielding indexes that are smaller by an order of magnitude). Despite the space reduction, these indexes only moderately impact the efficiency of the queries compared to the fastest indexes.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1022-1044"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Context-Aware Analysis of Genome Annotation Colocalization. 基因组注释定位的快速上下文感知分析

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-09 DOI: 10.1089/cmb.2024.0667

Askar Gafurov, Tomáš VinaŘ, Paul Medvedev, BroŇa Brejová

{"title":"Fast Context-Aware Analysis of Genome Annotation Colocalization.","authors":"Askar Gafurov, Tomáš VinaŘ, Paul Medvedev, BroŇa Brejová","doi":"10.1089/cmb.2024.0667","DOIUrl":"10.1089/cmb.2024.0667","url":null,"abstract":"An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, sequence repeats, regions with a particular epigenetic state, and copy number variants. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing random unrelated annotations. To incorporate more background information into such analyses, we propose a new null model based on a Markov chain that differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistic and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"946-964"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698669/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Imputing Metagenomic Hi-C Contacts Facilitates the Integrative Contig Binning Through Constrained Random Walk with Restart. 通过重新开始的受限随机游走，推算元基因组 Hi-C 联系促进了整合式 Contig 分选。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-09 DOI: 10.1089/cmb.2024.0663

Yuxuan Du, Wenxuan Zuo, Fengzhu Sun

{"title":"Imputing Metagenomic Hi-C Contacts Facilitates the Integrative Contig Binning Through Constrained Random Walk with Restart.","authors":"Yuxuan Du, Wenxuan Zuo, Fengzhu Sun","doi":"10.1089/cmb.2024.0663","DOIUrl":"10.1089/cmb.2024.0663","url":null,"abstract":"Metagenomic Hi-C (metaHi-C) has shown remarkable potential for retrieving high-quality metagenome-assembled genomes from complex microbial communities. Nevertheless, existing metaHi-C-based contig binning methods solely rely on Hi-C interactions between contigs, disregarding crucial biological information such as the presence of single-copy marker genes. To overcome this limitation, we introduce ImputeCC, an integrative contig binning tool optimized for metaHi-C datasets. ImputeCC integrates both Hi-C interactions and the discriminative power of single-copy marker genes to group marker-gene-containing contigs into preliminary bins. It also introduces a novel constrained random walk with restart algorithm to enhance Hi-C connectivity among contigs. Comprehensive assessments using both mock and real metaHi-C datasets from diverse environments demonstrate that ImputeCC consistently outperforms other Hi-C-based contig binning tools. A genus-level analysis of the sheep gut microbiota reconstructed by ImputeCC underlines its capability to recover key species from dominant genera and identify previously unknown genera.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1008-1021"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142154267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bifurcations and Homoclinic Orbits of a Model Consisting of Vegetation-Prey-Predator Populations. 由植被-猎物-捕食者种群组成的模型的分岔和同轴轨道

IF 1.7 4区生物学

Journal of Computational Biology Pub Date : 2024-09-12 DOI: 10.1089/cmb.2024.0485

Maryam Jafari Khanghahi,Reza Khoshsiar Ghaziani

{"title":"Bifurcations and Homoclinic Orbits of a Model Consisting of Vegetation-Prey-Predator Populations.","authors":"Maryam Jafari Khanghahi,Reza Khoshsiar Ghaziani","doi":"10.1089/cmb.2024.0485","DOIUrl":"https://doi.org/10.1089/cmb.2024.0485","url":null,"abstract":"This study provides a comprehensive analysis of the dynamics of a three-level vertical food chain model, specifically focusing on the interactions between vegetation, herbivores, and predators in a Snowshoe hare-Canadian lynx system. By simplifying the model through dimensional analysis, we determine conditions for equilibrium existence and identify various types of bifurcations, including Saddle-Node and Hopf bifurcations. Additionally, the study explores codimension-two bifurcations such as Bogdanov-Takens (BT) and zero-Hopf bifurcations. Coefficient formulas of normal forms are derived through the use of center manifold reduction and normal form theory. The study also presents an approximation of homoclinic orbits near a BT bifurcation of the system by computing explicit asymptotics based on regular perturbation methods. Utilizing the MATLAB package MATCONT, a family of limit cycles and their associated bifurcations are computed, including limit point cycles, period-doubling bifurcations, cusp points of cycles, fold-flip bifurcations, and various resonance bifurcations (R1, R2, R3, and R4). The biological implications of the findings are discussed in detail, highlighting how the identified bifurcations and dynamics can impact the population dynamics of vegetation, herbivores, and predators in real-world ecosystems. Numerical experiments validate the theoretical results and provide further support for the conclusions.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"71 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Cancer Subpopulation Competition Model Reveals Optimal Levels of Immune Response that Minimize Tumor Size. 癌症亚群竞争模型揭示了使肿瘤体积最小化的最佳免疫反应水平

IF 1.7 4区生物学

Journal of Computational Biology Pub Date : 2024-09-10 DOI: 10.1089/cmb.2024.0618

Wimonnat Sukpol,Teeraphan Laomettachit,Anuwat Tangthanawatsakul

{"title":"A Cancer Subpopulation Competition Model Reveals Optimal Levels of Immune Response that Minimize Tumor Size.","authors":"Wimonnat Sukpol,Teeraphan Laomettachit,Anuwat Tangthanawatsakul","doi":"10.1089/cmb.2024.0618","DOIUrl":"https://doi.org/10.1089/cmb.2024.0618","url":null,"abstract":"Breast cancer is a complex disease with significant phenotypic heterogeneity of cells, even within a single breast tumor. Emerging evidence underscores the significance of intratumoral competition, which can serve as a key contributor to cancer drug resistance, imparting substantial clinical implications. Understanding the competitive dynamics is paramount as it can significantly influence disease progression and treatment outcomes. In the present work, a mathematical model was developed using a system of differential equations to describe the dynamic interactions between two cancer subtypes (each further classified into cancer stem cells and tumor cells) and innate immune cells. The purpose of the model is to comprehensively understand the competitive interactions between the heterogeneous subpopulations. The equilibrium points and stability analysis for each equilibrium point were established. Model simulations showed that the competition between two cancer subtypes directly affects the number of both species. When competition between two cancer subtypes is strong, increasing the immune response rate specific to the more competitive species effectively reduces the tumor size. However, if the competition is relatively weak, an optimal immune response rate is required to minimize the total number of tumor cells. Rates below the optimal level fail to reduce the population of the stronger species, whereas rates above the optimal level can lead to the recurrence of the weaker species. Overall, this model provides insights into breast cancer dynamics and guides the development of effective treatment strategies.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"74 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rosalind Franklin Society Proudly Announces the 2023 Award Recipient for Journal of Computational Biology. 罗莎琳德-富兰克林学会自豪地宣布《计算生物学杂志》2023 年获奖者。

IF 1.7 4区生物学

Journal of Computational Biology Pub Date : 2024-09-01 DOI: 10.1089/cmb.2024.15655.rfs2023

Teresa M Przytycka

引用次数: 0

Protein-Protein Interaction Prediction Model Based on ProtBert-BiGRU-Attention. 基于 ProtBert-BiGRU-Attention 的蛋白质-蛋白质相互作用预测模型。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-09-01 Epub Date: 2024-07-29 DOI: 10.1089/cmb.2023.0297

Qian Gao, Chi Zhang, Ming Li, Tianfei Yu

{"title":"Protein-Protein Interaction Prediction Model Based on ProtBert-BiGRU-Attention.","authors":"Qian Gao, Chi Zhang, Ming Li, Tianfei Yu","doi":"10.1089/cmb.2023.0297","DOIUrl":"10.1089/cmb.2023.0297","url":null,"abstract":"The physiological activities within cells are mainly regulated through protein-protein interactions (PPI). Therefore, studying protein interactions has become an essential part of researching protein function and mechanisms. Traditional biological experiments required for PPI prediction are expensive and time consuming. For this reason, many methods based on predicting PPI from protein sequences have been proposed in recent years. However, existing computational methods usually require the combination of evolutionary feature information of proteins to predict PPI docking situations. Because different relevant features of selected proteins are chosen, there may be differences in the predicted results for PPI. This article proposes a PPI prediction method based on the pretrained protein sequence model ProtBert, combined with the Bidirectional Gated Recurrent Unit (BiGRU) and attention mechanism. Only using protein sequence information and leveraging ProtBert's powerful ability to capture amino acid feature information, BiGRU is used for further feature extraction of the amino acid vectors output by ProtBert. The attention mechanism is then applied to enhance the focus on different amino acid features and improve the expression ability of protein sequence features, ultimately obtaining binary classification results for protein interactions. Experimental results show that our proposed ProtBert-BiGRU-Attention model has good predictive performance for PPI. Through relevant comparative experiments, it has been proven that our model performs well in protein binary prediction. Furthermore, through the ablation experiment of the model, different deep learning modules' contributions to the prediction have been demonstrated.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"797-814"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141788234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0