Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron
{"title":"k-mer approaches for biodiversity genomics","authors":"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron","doi":"10.1101/gr.279452.124","DOIUrl":null,"url":null,"abstract":"The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, <em>k</em>-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of length <em>k</em>, called <em>k</em>-mers, and then analyzing the frequency or sequences of those <em>k</em>-mers. Analyses based on <em>k</em>-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications of <em>k</em>-mers in biodiversity genomics with a special focus on genome modeling.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"15 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279452.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of length k, called k-mers, and then analyzing the frequency or sequences of those k-mers. Analyses based on k-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications of k-mers in biodiversity genomics with a special focus on genome modeling.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.