Algorithms for Molecular Biology最新文献

筛选
英文 中文
Estimating similarity and distance using FracMinHash. 使用FracMinHash估计相似度和距离。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-05-15 DOI: 10.1186/s13015-025-00276-8
Mahmudur Rahman Hera, David Koslicki
{"title":"Estimating similarity and distance using FracMinHash.","authors":"Mahmudur Rahman Hera, David Koslicki","doi":"10.1186/s13015-025-00276-8","DOIUrl":"10.1186/s13015-025-00276-8","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing number and volume of genomic and metagenomic data necessitates scalable and robust computational models for precise analysis. Sketching techniques utilizing <math><mi>k</mi></math> -mers from a biological sample have proven to be useful for large-scale analyses. In recent years, FracMinHash has emerged as a popular sketching technique and has been used in several useful applications. Recent studies on FracMinHash proved unbiased estimators for the containment and Jaccard indices. However, theoretical investigations for other metrics are still lacking.</p><p><strong>Theoretical contributions: </strong>In this paper, we present a theoretical framework for estimating similarity/distance metrics by using FracMinHash sketches, when the metric is expressible in a certain form. We establish conditions under which such an estimation is sound and recommend a minimum scale factor s for accurate results. Experimental evidence supports our theoretical findings.</p><p><strong>Practical contributions: </strong>We also present frac-kmc, a fast and efficient FracMinHash sketch generator program. frac-kmc is the fastest known FracMinHash sketch generator, delivering accurate and precise results for cosine similarity estimation on real data. frac-kmc is also the first parallel tool for this task, allowing for speeding up sketch generation using multiple CPU cores - an option lacking in existing serialized tools. We show that by computing FracMinHash sketches using frac-kmc, we can estimate pairwise similarity speedily and accurately on real data. frac-kmc is freely available here: https://github.com/KoslickiLab/frac-kmc/.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"8"},"PeriodicalIF":1.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144081838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AlfaPang: alignment free algorithm for pangenome graph construction. AlfaPang:用于泛基因组图构建的无对齐算法。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-05-15 DOI: 10.1186/s13015-025-00277-7
Adam Cicherski, Anna Lisiecka, Norbert Dojer
{"title":"AlfaPang: alignment free algorithm for pangenome graph construction.","authors":"Adam Cicherski, Anna Lisiecka, Norbert Dojer","doi":"10.1186/s13015-025-00277-7","DOIUrl":"10.1186/s13015-025-00277-7","url":null,"abstract":"<p><p>The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the current paper we present AlfaPang, a new pangenome graph building algorithm. AlfaPang is based on a novel alignment-free approach that allows to construct pangenome graphs using significantly less computational resources than state-of-the-art tools. The code of AlfaPang is freely available at https://github.com/AdamCicherski/AlfaPang .</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"7"},"PeriodicalIF":1.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144081831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M C D A G : indexing maximal common subsequences for k strings. M C D A G:索引k个字符串的最大公共子序列。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-04-19 DOI: 10.1186/s13015-025-00271-z
Giovanni Buzzega, Alessio Conte, Roberto Grossi, Giulia Punzi
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math><ns0:mrow><ns0:mi>M</ns0:mi> <ns0:mstyle><ns0:mi>C</ns0:mi> <ns0:mi>D</ns0:mi> <ns0:mi>A</ns0:mi> <ns0:mi>G</ns0:mi></ns0:mstyle> </ns0:mrow> </ns0:math> : indexing maximal common subsequences for k strings.","authors":"Giovanni Buzzega, Alessio Conte, Roberto Grossi, Giulia Punzi","doi":"10.1186/s13015-025-00271-z","DOIUrl":"https://doi.org/10.1186/s13015-025-00271-z","url":null,"abstract":"<p><p>Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements in MCSs into a practical tool called <math><mrow><mi>M</mi> <mstyle><mi>C</mi> <mi>D</mi> <mi>A</mi> <mi>G</mi></mstyle> </mrow> </math> , the first publicly available tool that can index MCSs of real genomic data, and show that its definition can be generalized to multiple strings. We demonstrate that our tool can index pairs of sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes. For three or more sequences, we observe experimentally that the minimum index may exhibit a significant increase in the number of nodes.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"6"},"PeriodicalIF":1.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unbiased anchors for reliable genome-wide synteny detection. 无偏锚可靠的全基因组同步检测。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-04-05 DOI: 10.1186/s13015-025-00275-9
Karl K Käther, Andreas Remmel, Steffen Lemke, Peter F Stadler
{"title":"Unbiased anchors for reliable genome-wide synteny detection.","authors":"Karl K Käther, Andreas Remmel, Steffen Lemke, Peter F Stadler","doi":"10.1186/s13015-025-00275-9","DOIUrl":"10.1186/s13015-025-00275-9","url":null,"abstract":"<p><p>Orthology inference lies at the foundation of comparative genomics research. The correct identification of loci which descended from a common ancestral sequence is not only complicated by sequence divergence but also duplication and other genome rearrangements. The conservation of gene order, i.e. synteny, is used in conjunction with sequence similarity as an additional factor for orthology determination. Current approaches, however, rely on genome annotations and are therefore limited. Here we present an annotation-free approach and compare it to synteny analysis with annotations. We find that our approach works better in closely related genomes whereas there is a better performance with annotations for more distantly related genomes. Overall, the presented algorithm offers a useful alternative to annotation-based methods and can outperform them in many cases.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"5"},"PeriodicalIF":1.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972476/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143788963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The open-closed mod-minimizer algorithm. 开闭模最小化算法。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-03-17 DOI: 10.1186/s13015-025-00270-0
Ragnar Groot Koerkamp, Daniel Liu, Giulio Ermanno Pibiri
{"title":"The open-closed mod-minimizer algorithm.","authors":"Ragnar Groot Koerkamp, Daniel Liu, Giulio Ermanno Pibiri","doi":"10.1186/s13015-025-00270-0","DOIUrl":"10.1186/s13015-025-00270-0","url":null,"abstract":"<p><p>Sampling algorithms that deterministically select a subset of <math><mi>k</mi></math> -mers are an important building block in bioinformatics applications. For example, they are used to index large textual collections, like DNA, and to compare sequences quickly. In such applications, a sampling algorithm is required to select one <math><mi>k</mi></math> -mer out of every window of w consecutive <math><mi>k</mi></math> -mers. The folklore and most used scheme is the random minimizer that selects the smallest <math><mi>k</mi></math> -mer in the window according to some random order. This scheme is remarkably simple and versatile, and has a density (expected fraction of selected <math><mi>k</mi></math> -mers) of <math><mrow><mn>2</mn> <mo>/</mo> <mo>(</mo> <mi>w</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo></mrow> </math> . In practice, lower density leads to faster methods and smaller indexes, and it turns out that the random minimizer is not the best one can do. Indeed, some schemes are known to approach optimal density 1/w when <math><mrow><mi>k</mi> <mo>→</mo> <mi>∞</mi></mrow> </math> , like the recently introduced mod-minimizer (Groot Koerkamp and Pibiri, WABI 2024). In this work, we study methods that achieve low density when <math><mrow><mi>k</mi> <mo>≤</mo> <mi>w</mi></mrow> </math> . In this small-k regime, a practical method with provably better density than the random minimizer is the miniception (Zheng et al., Bioinformatics 2021). This method can be elegantly described as sampling the smallest closed sycnmer (Edgar, PeerJ 2021) in the window according to some random order. We show that extending the miniception to prefer sampling open syncmers yields much better density. This new method-the open-closed minimizer-offers improved density for small <math><mrow><mi>k</mi> <mo>≤</mo> <mi>w</mi></mrow> </math> while being as fast to compute as the random minimizer. Compared to methods based on decycling sets, that achieve very low density in the small-k regime, our method has comparable density while being computationally simpler and intuitive. Furthermore, we extend the mod-minimizer to improve density of any scheme that works well for small k to also work well when <math><mrow><mi>k</mi> <mo>></mo> <mi>w</mi></mrow> </math> is large. We hence obtain the open-closed mod-minimizer, a practical method that improves over the mod-minimizer for all k.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"4"},"PeriodicalIF":1.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143651867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mem-based pangenome indexing for k-mer queries. 针对 k-mer 查询的基于 Mem 的泛基因组索引。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-03-01 DOI: 10.1186/s13015-025-00272-y
Stephen Hwang, Nathaniel K Brown, Omar Y Ahmed, Katharine M Jenike, Sam Kovaka, Michael C Schatz, Ben Langmead
{"title":"Mem-based pangenome indexing for k-mer queries.","authors":"Stephen Hwang, Nathaniel K Brown, Omar Y Ahmed, Katharine M Jenike, Sam Kovaka, Michael C Schatz, Ben Langmead","doi":"10.1186/s13015-025-00272-y","DOIUrl":"10.1186/s13015-025-00272-y","url":null,"abstract":"<p><p>Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8 <math><mo>×</mo></math> smaller than a comparable KMC3 index and 11.4 <math><mo>×</mo></math> smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 s, 2.5 <math><mo>×</mo></math> faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"3"},"PeriodicalIF":1.5,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11871630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding high posterior density phylogenies by systematically extending a directed acyclic graph. 通过系统地扩展有向无环图来寻找高后验密度系统发育。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-02-28 DOI: 10.1186/s13015-025-00273-x
Chris Jennings-Shaffer, David H Rich, Matthew Macaulay, Michael D Karcher, Tanvi Ganapathy, Shosuke Kiami, Anna Kooperberg, Cheng Zhang, Marc A Suchard, Frederick A Matsen
{"title":"Finding high posterior density phylogenies by systematically extending a directed acyclic graph.","authors":"Chris Jennings-Shaffer, David H Rich, Matthew Macaulay, Michael D Karcher, Tanvi Ganapathy, Shosuke Kiami, Anna Kooperberg, Cheng Zhang, Marc A Suchard, Frederick A Matsen","doi":"10.1186/s13015-025-00273-x","DOIUrl":"10.1186/s13015-025-00273-x","url":null,"abstract":"<p><p>Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move a tree through its space as a random walk. Previous work explored the possibility of replacing this random walk with a systematic search, but was quickly overwhelmed by the large number of probable trees in the posterior distribution. In this paper we develop methods to sidestep this problem using a recently introduced structure called the subsplit directed acyclic graph (sDAG). This structure can represent many trees at once, and local rearrangements of trees translate to methods of enlarging the sDAG. Here we propose two methods of introducing, ranking, and selecting local rearrangements on sDAGs to produce a collection of trees with high posterior density. One of these methods successfully recovers the set of high posterior density trees across a range of data sets. However, we find that a simpler strategy of aggregating trees into an sDAG in fact is computationally faster and returns a higher fraction of probable trees.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"2"},"PeriodicalIF":1.5,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11869616/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143532146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractional hitting sets for efficient multiset sketching. 分数命中集用于高效的多集素描。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-02-08 DOI: 10.1186/s13015-024-00268-0
Timothé Rouzé, Igor Martayan, Camille Marchet, Antoine Limasset
{"title":"Fractional hitting sets for efficient multiset sketching.","authors":"Timothé Rouzé, Igor Martayan, Camille Marchet, Antoine Limasset","doi":"10.1186/s13015-024-00268-0","DOIUrl":"10.1186/s13015-024-00268-0","url":null,"abstract":"<p><p>The exponential increase in publicly available sequencing data and genomic resources necessitates the development of highly efficient methods for data processing and analysis. Locality-sensitive hashing techniques have successfully transformed large datasets into smaller, more manageable sketches while maintaining comparability using metrics such as Jaccard and containment indices. However, fixed-size sketches encounter difficulties when applied to divergent datasets. Scalable sketching methods, such as sourmash, provide valuable solutions but still lack resource-efficient, tailored indexing. Our objective is to create lighter sketches with comparable results while enhancing efficiency. We introduce the concept of Fractional Hitting Sets, a generalization of Universal Hitting Sets, which cover a specified fraction of the k-mer space. In theory and practice, we demonstrate the feasibility of achieving such coverage with simple but highly efficient schemes. By encoding the covered k-mers as super-k-mers, we provide a space-efficient exact representation that also enables optimized comparisons. Our novel tool, supersampler, implements this scheme, and experimental results with real bacterial collections closely match our theoretical findings. In comparison to sourmash, supersampler achieves similar outcomes while utilizing an order of magnitude less space and memory and operating several times faster. This highlights the potential of our approach in addressing the challenges presented by the ever-expanding landscape of genomic data. supersampler is an open-source software and can be accessed at https://github.com/TimRouze/supersampler . The data required to reproduce the results presented in this manuscript is available at https://github.com/TimRouze/supersampler/experiments .</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"1"},"PeriodicalIF":1.5,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807336/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143374779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the parameterized complexity of the median and closest problems under some permutation metrics. 若干置换度量下中值和最近邻问题的参数化复杂度。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2024-12-24 DOI: 10.1186/s13015-024-00269-z
Luís Cunha, Ignasi Sau, Uéverton Souza
{"title":"On the parameterized complexity of the median and closest problems under some permutation metrics.","authors":"Luís Cunha, Ignasi Sau, Uéverton Souza","doi":"10.1186/s13015-024-00269-z","DOIUrl":"10.1186/s13015-024-00269-z","url":null,"abstract":"<p><p>Genome rearrangements are events where large blocks of DNA exchange places during evolution. The analysis of these events is a promising tool for understanding evolutionary genomics, providing data for phylogenetic reconstruction based on genome rearrangement measures. Many pairwise rearrangement distances have been proposed, based on finding the minimum number of rearrangement events to transform one genome into the other, using some predefined operation. When more than two genomes are considered, we have the more challenging problem of rearrangement-based phylogeny reconstruction. Given a set of genomes and a distance notion, there are at least two natural ways to define the \"target\" genome. On the one hand, finding a genome that minimizes the sum of the distances from this to any other, called the median genome. On the other hand, finding a genome that minimizes the maximum distance to any other, called the closest genome. Considering genomes as permutations of distinct integers, some distance metrics have been extensively studied. We investigate the median and closest problems on permutations over the following metrics: breakpoint distance, swap distance, block-interchange distance, short-block-move distance, and transposition distance. In biological applications some values are usually very small, such as the solution value d or the number k of input permutations. For each of these metrics and parameters d or k, we analyze the closest and the median problems from the viewpoint of parameterized complexity. We obtain the following results: NP-hardness for finding the median/closest permutation regarding some metrics of distance, even for only <math><mrow><mi>k</mi> <mo>=</mo> <mn>3</mn></mrow> </math> permutations; Polynomial kernels for the problems of finding the median permutation of all studied metrics, considering the target distance d as parameter; NP-hardness result for finding the closest permutation by short-block-moves; FPT algorithms and infeasibility of polynomial kernels for finding the closest permutation for some metrics when parameterized by the target distance d.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"19 1","pages":"24"},"PeriodicalIF":1.5,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142885647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TINNiK: inference of the tree of blobs of a species network under the coalescent model. TINNiK:聚合模型下的物种网络 Blob 树推断。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2024-11-05 DOI: 10.1186/s13015-024-00266-2
Elizabeth S Allman, Hector Baños, Jonathan D Mitchell, John A Rhodes
{"title":"TINNiK: inference of the tree of blobs of a species network under the coalescent model.","authors":"Elizabeth S Allman, Hector Baños, Jonathan D Mitchell, John A Rhodes","doi":"10.1186/s13015-024-00266-2","DOIUrl":"10.1186/s13015-024-00266-2","url":null,"abstract":"<p><p>The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"19 1","pages":"23"},"PeriodicalIF":1.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信