{"title":"彩色 k-mer 集的进展:好奇者的必备知识","authors":"Camille Marchet","doi":"arxiv-2409.05214","DOIUrl":null,"url":null,"abstract":"This paper provides a comprehensive review of recent advancements in\nk-mer-based data structures representing collections of several samples\n(sometimes called colored de Bruijn graphs) and their applications in\nlarge-scale sequence indexing and pangenomics. The review explores the\nevolution of k-mer set representations, highlighting the trade-offs between\nexact and inexact methods, as well as the integration of compression strategies\nand modular implementations. I discuss the impact of these structures on\npractical applications and describe recent utilization of these methods for\nanalysis. By surveying the state-of-the-art techniques and identifying emerging\ntrends, this work aims to guide researchers in selecting and developing methods\nfor large scale and reference-free genomic data. For a broader overview of\nk-mer set representations and foundational data structures, see the\naccompanying article on practical k-mer sets.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advancements in colored k-mer sets: essentials for the curious\",\"authors\":\"Camille Marchet\",\"doi\":\"arxiv-2409.05214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper provides a comprehensive review of recent advancements in\\nk-mer-based data structures representing collections of several samples\\n(sometimes called colored de Bruijn graphs) and their applications in\\nlarge-scale sequence indexing and pangenomics. The review explores the\\nevolution of k-mer set representations, highlighting the trade-offs between\\nexact and inexact methods, as well as the integration of compression strategies\\nand modular implementations. I discuss the impact of these structures on\\npractical applications and describe recent utilization of these methods for\\nanalysis. By surveying the state-of-the-art techniques and identifying emerging\\ntrends, this work aims to guide researchers in selecting and developing methods\\nfor large scale and reference-free genomic data. For a broader overview of\\nk-mer set representations and foundational data structures, see the\\naccompanying article on practical k-mer sets.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Advancements in colored k-mer sets: essentials for the curious
This paper provides a comprehensive review of recent advancements in
k-mer-based data structures representing collections of several samples
(sometimes called colored de Bruijn graphs) and their applications in
large-scale sequence indexing and pangenomics. The review explores the
evolution of k-mer set representations, highlighting the trade-offs between
exact and inexact methods, as well as the integration of compression strategies
and modular implementations. I discuss the impact of these structures on
practical applications and describe recent utilization of these methods for
analysis. By surveying the state-of-the-art techniques and identifying emerging
trends, this work aims to guide researchers in selecting and developing methods
for large scale and reference-free genomic data. For a broader overview of
k-mer set representations and foundational data structures, see the
accompanying article on practical k-mer sets.