{"title":"Pig and quail CpG methylation datasets from short and long read sequencing technologies.","authors":"Paul Terzian, Céline Vandecasteele, Joanna Lledo, Rémy-Félix Serre, Jules Sabban, Claire Kuchly, Frédérique Pitel, Sophie Leroux, Julie Demars, Nathalie Iannuccelli, Katia Fève, Michèle Bonnet, Christine Gaspin, Denis Milan, Carole Iampietro, Christophe Klopp, Cécile Donnadieu","doi":"10.1038/s41597-025-04769-4","DOIUrl":null,"url":null,"abstract":"<p><p>CpG methylation, a key epigenetic mark involved in gene regulation, development, and other biological processes, is commonly analyzed using Whole-Genome Bisulfite Sequencing (WGBS). However, bisulfite treatment causes significant DNA degradation. Enzymatic Methyl-seq (EM-seq) offers a short-read alternative that preserves DNA integrity but requires conversion steps, limiting its compatibility with downstream analyses. Third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and PacBio, enable direct detection of DNA modifications without altering the DNA, providing simultaneous genome and epigenome information. This work presents a comprehensive dataset combining long- and short-read sequencing data, including ONT, PacBio, Enzymatic Methyl-seq, and WGBS, for two agronomically relevant species: pig (Sus scrofa) and quail (Coturnix japonica). Data quality evaluation reveals high nucleotide quality scores for PacBio and short reads, robust alignment rates for long reads, and inter-method correlations in CpG methylation calling ranging from 0.76 to 0.99. This dataset is a valuable resource for training methylation callers and represents the first combined methylation dataset for these species, providing an essential benchmark for assessing emerging sequencing technologies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"556"},"PeriodicalIF":5.8000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11961558/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04769-4","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
CpG methylation, a key epigenetic mark involved in gene regulation, development, and other biological processes, is commonly analyzed using Whole-Genome Bisulfite Sequencing (WGBS). However, bisulfite treatment causes significant DNA degradation. Enzymatic Methyl-seq (EM-seq) offers a short-read alternative that preserves DNA integrity but requires conversion steps, limiting its compatibility with downstream analyses. Third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and PacBio, enable direct detection of DNA modifications without altering the DNA, providing simultaneous genome and epigenome information. This work presents a comprehensive dataset combining long- and short-read sequencing data, including ONT, PacBio, Enzymatic Methyl-seq, and WGBS, for two agronomically relevant species: pig (Sus scrofa) and quail (Coturnix japonica). Data quality evaluation reveals high nucleotide quality scores for PacBio and short reads, robust alignment rates for long reads, and inter-method correlations in CpG methylation calling ranging from 0.76 to 0.99. This dataset is a valuable resource for training methylation callers and represents the first combined methylation dataset for these species, providing an essential benchmark for assessing emerging sequencing technologies.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.