Journal of Computational Biology最新文献

筛选
英文 中文
MMG4: Recognition of G4-Forming Sequences Based on Markov Model. MMG4:基于马尔可夫模型的 G4 形成序列识别。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-17 DOI: 10.1089/cmb.2024.0523
Boyuan Yu, Hao Zhang, Cong Pian, Yuanyuan Chen
{"title":"MMG4: Recognition of G4-Forming Sequences Based on Markov Model.","authors":"Boyuan Yu, Hao Zhang, Cong Pian, Yuanyuan Chen","doi":"10.1089/cmb.2024.0523","DOIUrl":"https://doi.org/10.1089/cmb.2024.0523","url":null,"abstract":"<p><p>G-quadruplexes (G4s) are special nucleic acid structures with various important biological functions. Existing tools and technologies for G4-forming sequences recognition are limited to time-consuming and costly methods such as circular dichroism and nuclear magnetic resonance. Developing a fast and accurate model for G4-forming sequences recognition has far-reaching significance. In this study, MMG4, a novel model to recognize G4-forming sequences based on Markov model (MM), was developed and the phenomenon of high recognition accuracy in the central region of the sequence and low accuracy in the two end regions was discovered. It was further found that the differences in base transfer probabilities, ratio distribution, and G4-motif structural content in different regions may be the causes of this phenomenon. The study also explored the impact of sequence length on recognition accuracy and found the optimal recognition interval to be [910-1049], with the highest recognition accuracy reaching 85.95%. By extracting sequence features, the study constructed three types of machine learning models: random forest (RF), support vector machine, and back-propagation neural network. It was found that recognition performance of MM was significantly better than that of the other three machine learning models, proving that the recognition method based on MM can effectively capture the correlation information between adjacent nucleotides of G4. By combining MM with the three machine learning models, the predictive performance of MMG4 improved. Among them, the RF model combined with MM has the best performance, achieving an area under the receiver operating characteristic curve value of 0.93 and an area under the precision-recall curve value of 0.9. Finally, the study validated the model robustness and generalization ability through independent testing dataset.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Accuracy Positivity-Preserving Finite Difference Approximations of the Chemotaxis Model for Tumor Invasion. 肿瘤侵袭趋化模型的高精度正性保留有限差分近似值
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-07 DOI: 10.1089/cmb.2023.0316
Lin Zhang, Jigen Peng, Yongbin Ge, Haiyang Li, Yuchao Tang
{"title":"High-Accuracy Positivity-Preserving Finite Difference Approximations of the Chemotaxis Model for Tumor Invasion.","authors":"Lin Zhang, Jigen Peng, Yongbin Ge, Haiyang Li, Yuchao Tang","doi":"10.1089/cmb.2023.0316","DOIUrl":"10.1089/cmb.2023.0316","url":null,"abstract":"<p><p>Numerical simulation of the complex evolution process for tumor invasion plays an extremely important role in-depth exploring the bio-taxis phenomena of tumor growth and metastasis. In view of the fact that low-accuracy numerical methods often have large errors and low resolution, very refined grids have to be used if we want to get high-resolution simulating results, which leads to a great deal of computational cost. In this paper, we are committed to developing a class of high-accuracy positivity-preserving finite difference methods to solve the chemotaxis model for tumor invasion. First, two unconditionally stable implicit compact difference schemes for solving the model are proposed; second, the local truncation errors of the new schemes are analyzed, which show that they have second-order accuracy in time and fourth-order accuracy in space; third, based on the proposed schemes, the high-accuracy numerical integration idea of binary functions is employed to structure a linear compact weighting formula that guarantees fourth-order accuracy and nonnegative, and then a positivity-preserving and time-marching algorithm is established; and finally, the accuracy, stability, and positivity-preserving of the proposed methods are verified by several numerical experiments, and the evolution phenomena of tumor invasion over time are numerically simulated and analyzed.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142380968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correcting for Observation Bias in Cancer Progression Modeling. 纠正癌症进展模型中的观察偏差
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 DOI: 10.1089/cmb.2024.0666
Rudolf Schill, Maren Klever, Andreas Lösch, Y Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang, Niko Beerenwinkel
{"title":"Correcting for Observation Bias in Cancer Progression Modeling.","authors":"Rudolf Schill, Maren Klever, Andreas Lösch, Y Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang, Niko Beerenwinkel","doi":"10.1089/cmb.2024.0666","DOIUrl":"10.1089/cmb.2024.0666","url":null,"abstract":"<p><p>Tumor progression is driven by the accumulation of genetic alterations, including both point mutations and copy number changes. Understanding the temporal sequence of these events is crucial for comprehending the disease but is not directly discernible from cross-sectional genomic data. Cancer progression models, including Mutual Hazard Networks (MHNs), aim to reconstruct the dynamics of tumor progression by learning the causal interactions between genetic events based on their co-occurrence patterns in cross-sectional data. Here, we highlight a commonly overlooked bias in cross-sectional datasets that can distort progression modeling. Tumors become clinically detectable when they cause symptoms or are identified through imaging or tests. Detection factors, such as size, inflammation (fever, fatigue), and elevated biochemical markers, are influenced by genomic alterations. Ignoring these effects leads to \"conditioning on a collider\" bias, where events making the tumor more observable appear anticorrelated, creating false suppressive effects or masking promoting effects among genetic events. We enhance MHNs by incorporating the effects of genetic progression events on the inclusion of a tumor in a dataset, thus correcting for collider bias. We derive an efficient tensor formula for the likelihood function and apply it to two datasets from the MSK-IMPACT study. In colon adenocarcinoma, we observe a significantly higher rate of clinical detection for TP53-positive tumors, while in lung adenocarcinoma, the same is true for EGFR-positive tumors. Compared to classical MHNs, this approach eliminates several spurious suppressive interactions and uncovers multiple promoting effects.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 10","pages":"927-945"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RECOMB 2024 Special Issue. RECOMB 2024 特刊。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-20 DOI: 10.1089/cmb.2024.0809
Jian Ma, Mona Singh
{"title":"RECOMB 2024 Special Issue.","authors":"Jian Ma, Mona Singh","doi":"10.1089/cmb.2024.0809","DOIUrl":"10.1089/cmb.2024.0809","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"907"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142288328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate IsoRank for Scalable and Functionally Meaningful Cross-Species Alignments of Protein Interaction Networks. 用于蛋白质相互作用网络的可扩展和有功能意义的跨物种对齐的近似 IsoRank。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-24 DOI: 10.1089/cmb.2024.0673
Kapil Devkota, Anselm Blumer, Xiaozhe Hu, Lenore Cowen
{"title":"Approximate IsoRank for Scalable and Functionally Meaningful Cross-Species Alignments of Protein Interaction Networks.","authors":"Kapil Devkota, Anselm Blumer, Xiaozhe Hu, Lenore Cowen","doi":"10.1089/cmb.2024.0673","DOIUrl":"10.1089/cmb.2024.0673","url":null,"abstract":"<p><p>The IsoRank algorithm of Singh, Xu, and Berger was a pioneering algorithmic advance that applied spectral methods to the problem of cross-species global alignment of biological networks. We develop a new IsoRank approximation that exploits the mathematical properties of IsoRank's linear system to solve the problem in quadratic time with respect to the maximum size of the two protein-protein interaction (PPI) networks. We further propose a refinement to this initial approximation so that the updated result is even closer to the original IsoRank formulation while remaining computationally inexpensive. In experiments on synthetic and real PPI networks with various proposed metrics to measure alignment quality, we find the results of our approximate IsoRank are nearly as accurate as the original IsoRank. In fact, for functional enrichment-based measures of global network alignment quality, our approximation performs better than the exact IsoRank, which is doubtless because it is more robust to the noise of missing or incorrect edges. It also performs competitively against two more recent global network alignment algorithms. We also present an analogous approximation to IsoRankN, which extends the network alignment to more than two species.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"990-1007"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes. 无损近似模式匹配:高效搜索方案的自动设计
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-30 DOI: 10.1089/cmb.2024.0664
Luca Renders, Lore Depuydt, Sven Rahmann, Jan Fostier
{"title":"Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes.","authors":"Luca Renders, Lore Depuydt, Sven Rahmann, Jan Fostier","doi":"10.1089/cmb.2024.0664","DOIUrl":"10.1089/cmb.2024.0664","url":null,"abstract":"<p><p>This study introduces a pioneering approach to automate the creation of search schemes for lossless approximate pattern matching. Search schemes are combinatorial structures that define a series of searches over a partitioned pattern. Each search specifies the processing order of these parts and the cumulative lower and upper bounds on the number of errors in each part of the pattern. Together, these searches ensure the identification of all approximate occurrences of a search pattern within a predefined limit of <i>k</i> errors. While existing literature offers designed schemes for up to <i>k</i> = 4 errors, designing search schemes for larger <i>k</i> values incurs escalating computational costs. Our method integrates a greedy algorithm and a novel Integer Linear Programming (ILP) formulation to design efficient search schemes for up to <i>k</i> = 7 errors. Comparative analyses demonstrate the superiority of our ILP-optimal schemes over alternative strategies in both theoretical and practical contexts. Additionally, we propose a dynamic scheme selection technique tailored to specific search patterns, further enhancing efficiency. Combined, this yields runtime reductions of up to 53% for higher <i>k</i> values. To facilitate search scheme generation, we present Hato, an open-source software tool (AGPL-3.0 license) employing the greedy algorithm and utilizing CPLEX for ILP solving. Furthermore, we introduce Columba 1.2, an open-source lossless read-mapper (AGPL-3.0 license) implemented in C++. Columba surpasses existing state-of-the-art tools by identifying all approximate occurrences of 100,000 Illumina reads (150 bp) in the human reference genome within 24 seconds (maximum edit distance of 4) and 75 seconds (maximum edit distance of 6) using a single CPU core. Notably, our study showcases Columba's capability to align 100,000 reads of length 50, with high error rates and up to an edit distance of 7, in a mere 2 hours and 15 minutes. This achievement is unmatched by other lossless aligners, which require over 3 hours for edit distance 5 alignments. Moreover, Columba exhibits a mapping rate four times higher than that of a lossy tool for this dataset.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"975-989"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Optimal Metabolic Factories. 稳健的最佳代谢工厂
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-09-27 DOI: 10.1089/cmb.2024.0748
Spencer Krieger, John Kececioglu
{"title":"Robust Optimal Metabolic Factories.","authors":"Spencer Krieger, John Kececioglu","doi":"10.1089/cmb.2024.0748","DOIUrl":"10.1089/cmb.2024.0748","url":null,"abstract":"<p><p>Perhaps the most fundamental model in synthetic and systems biology for inferring pathways in metabolic reaction networks is a metabolic <i>factory</i>: a system of reactions that starts from a set of source compounds and produces a set of target molecules, while conserving or not depleting intermediate metabolites. Finding a shortest factory-that minimizes a sum of real-valued weights on its reactions to infer the most likely pathway-is NP-complete. The current state-of-the-art for shortest factories solves a mixed-integer linear program with a major drawback: it requires the user to set a critical parameter, where too large a value can make optimal solutions infeasible, while too small a value can yield degenerate solutions due to numerical error. We present the first <i>robust algorithm</i> for optimal factories that is both <i>parameter-free</i> (relieving the user from determining a parameter setting) and <i>degeneracy-free</i> (guaranteeing it finds an optimal nondegenerate solution). We also give for the first time a <i>complete characterization</i> of the graph-theoretic structure of shortest factories, that reveals an important class of degenerate solutions which was overlooked and potentially output by the prior state-of-the-art.We show degeneracy is precisely due to <i>invalid stoichiometries</i> in reactions, and provide an efficient algorithm for identifying all such <i>misannotations</i> in a metabolic network. In addition we settle the relationship between the two established pathway models of <i>hyperpaths</i> and factories by proving hyperpaths actually comprise a <i>subclass</i> of factories. Comprehensive experiments over all instances from the standard metabolic reaction databases in the literature demonstrate our parameter-free exact algorithm is <i>fast in practice</i>, quickly finding optimal factories in large real-world networks containing thousands of reactions. A preliminary implementation of our robust algorithm for shortest factories in a new tool called Freeia is available free for research use at http://freeia.cs.arizona.edu.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1045-1086"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements. 带重复、缺失、部分标记和不确定测量的环形公路和高速公路问题的近似和精确优化算法。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-10 DOI: 10.1089/cmb.2024.0661
C S Elder, Minh Hoang, Mohsen Ferdosi, Carl Kingsford
{"title":"Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.","authors":"C S Elder, Minh Hoang, Mohsen Ferdosi, Carl Kingsford","doi":"10.1089/cmb.2024.0661","DOIUrl":"10.1089/cmb.2024.0661","url":null,"abstract":"<p><p>The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"908-926"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protocol for Designing De Novo Noncanonical Peptide Binders in OSPREY. 在 OSPREY 中设计新的非简约肽结合剂的方案。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-04 DOI: 10.1089/cmb.2024.0669
Henry Childs, Nathan Guerin, Pei Zhou, Bruce R Donald
{"title":"Protocol for Designing <i>De Novo</i> Noncanonical Peptide Binders in OSPREY.","authors":"Henry Childs, Nathan Guerin, Pei Zhou, Bruce R Donald","doi":"10.1089/cmb.2024.0669","DOIUrl":"10.1089/cmb.2024.0669","url":null,"abstract":"<p><p>D-peptides, the mirror image of canonical L-peptides, offer numerous biological advantages that make them effective therapeutics. This article details how to use DexDesign, the newest OSPREY-based algorithm, for designing these D-peptides <i>de novo</i>. OSPREY physics-based models precisely mimic energy-equivariant reflection operations, enabling the generation of D-peptide scaffolds from L-peptide templates. Due to the scarcity of D-peptide:L-protein structural data, DexDesign calls a geometric hashing algorithm, Method of Accelerated Search for Tertiary Ensemble Representatives, as a subroutine to produce a synthetic structural dataset. DexDesign enables mixed-chirality designs with a new user interface and also reduces the conformation and sequence search space using three new design techniques: Minimum Flexible Set, Inverse Alanine Scanning, and K*-based Mutational Scanning.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"965-974"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142371980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs. 模式在哪里?彩色德布鲁因图的重复感知压缩。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-10-01 Epub Date: 2024-10-09 DOI: 10.1089/cmb.2024.0714
Alessio Campanelli, Giulio Ermanno Pibiri, Jason Fan, Rob Patro
{"title":"Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs<sup />.","authors":"Alessio Campanelli, Giulio Ermanno Pibiri, Jason Fan, Rob Patro","doi":"10.1089/cmb.2024.0714","DOIUrl":"10.1089/cmb.2024.0714","url":null,"abstract":"<p><p>We describe lossless compressed data structures for the <i>colored</i> de Bruijn graph (or c-dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a map from <i>k</i>-mers to their <i>color sets</i>. The color set of a <i>k</i>-mer is the set of all identifiers, or <i>colors</i>, of the references that contain the <i>k</i>-mer. While these maps find countless applications in computational biology (e.g., basic query, reading mapping, abundance estimation, etc.), their memory usage represents a serious challenge for large-scale sequence indexing. Our solutions leverage on the intrinsic repetitiveness of the color sets when indexing large collections of related genomes. Hence, the described algorithms factorize the color sets into patterns that repeat across the entire collection and represent these patterns once instead of redundantly replicating their representation as would happen if the sets were encoded as atomic lists of integers. Experimental results across a range of datasets and query workloads show that these representations substantially improve over the space effectiveness of the best previous solutions (sometimes, even dramatically, yielding indexes that are smaller by an order of magnitude). Despite the space reduction, these indexes only moderately impact the efficiency of the queries compared to the fastest indexes.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1022-1044"},"PeriodicalIF":1.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信