Advancements in colored k-mer sets: essentials for the curious

arXiv - QuanBio - Genomics Pub Date : 2024-09-08 DOI:arxiv-2409.05214

Camille Marchet

引用次数: 0

Abstract

This paper provides a comprehensive review of recent advancements in k-mer-based data structures representing collections of several samples (sometimes called colored de Bruijn graphs) and their applications in large-scale sequence indexing and pangenomics. The review explores the evolution of k-mer set representations, highlighting the trade-offs between exact and inexact methods, as well as the integration of compression strategies and modular implementations. I discuss the impact of these structures on practical applications and describe recent utilization of these methods for analysis. By surveying the state-of-the-art techniques and identifying emerging trends, this work aims to guide researchers in selecting and developing methods for large scale and reference-free genomic data. For a broader overview of k-mer set representations and foundational data structures, see the accompanying article on practical k-mer sets.

查看原文本刊更多论文

彩色 k-mer 集的进展：好奇者的必备知识

本文全面综述了代表多个样本集合（有时称为彩色德布鲁因图）的基于墨子的数据结构的最新进展及其在大规模序列索引和泛基因组学中的应用。这篇综述探讨了 k-mer 集表示法的演变，强调了精确方法和非精确方法之间的权衡，以及压缩策略和模块化实现的整合。我讨论了这些结构对实际应用的影响，并介绍了最近利用这些方法进行分析的情况。通过调查最先进的技术和识别新兴趋势，这项工作旨在指导研究人员选择和开发用于大规模和无参考文献基因组数据的方法。有关 k-mer 集表示法和基础数据结构的更广泛概述，请参阅有关实用 k-mer 集的配套文章。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量