{"title":"甲基化参考数据集从四重奏DNA材料为基准的表观基因组测序。","authors":"Xiaorou Guo,Qingwang Chen,Yuanfeng Zhang,Yujing Zhang,Yaqing Liu,Shumeng Duan,Yu Ma,Peng Ni,Jianxin Wang,Bo He,Luyao Ren,Ruiwen Ma,Wanwan Hou,Ying Yu,Bingsi Li,Fujun Qiu,Yuan Sun,Zhihong Zhang,Weihong Xu,Xiang Fang,Jinming Li,Leming Shi,Rui Zhang,Yuanting Zheng,Lianhua Dong","doi":"10.1038/s41467-025-64250-z","DOIUrl":null,"url":null,"abstract":"The lack of quantitative methylation reference datasets (ground truth) and cross-laboratory reproducibility assessment hinders clinical translation of epigenome-wide sequencing technologies. Using certified Quartet DNA reference materials, here we generate 108 epigenome-sequencing datasets across three mainstream protocols (whole-genome bisulfite sequencing, enzymatic methyl-seq, and TET-assisted pyridine borane sequencing) with triplicates per sample across laboratories. We observe strand-specific methylation biases across all protocols and libraries. Cross-laboratory reproducibility analyses reveal high quantitative methylation levels agreement (mean Pearson correlation coefficient (PCC) = 0.96) but low detection concordance (mean Jaccard index = 0.36). Using consensus voting, we construct genome-wide quantitative methylation reference datasets serving as ground truth for proficiency testing. Key technical parameters-including mean CpG depth, coverage, and strand consistency-correlate strongly with reference-dependent quality metrics (recall, PCC, and RMSE). Collectively, these resources establish foundational standards for benchmarking emerging epigenomic technologies and analytical pipelines, enabling robust, standardized quality control in research and clinical applications.","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"88 1","pages":"9202"},"PeriodicalIF":15.7000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Methylation reference datasets from quartet DNA materials for benchmarking epigenome sequencing.\",\"authors\":\"Xiaorou Guo,Qingwang Chen,Yuanfeng Zhang,Yujing Zhang,Yaqing Liu,Shumeng Duan,Yu Ma,Peng Ni,Jianxin Wang,Bo He,Luyao Ren,Ruiwen Ma,Wanwan Hou,Ying Yu,Bingsi Li,Fujun Qiu,Yuan Sun,Zhihong Zhang,Weihong Xu,Xiang Fang,Jinming Li,Leming Shi,Rui Zhang,Yuanting Zheng,Lianhua Dong\",\"doi\":\"10.1038/s41467-025-64250-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The lack of quantitative methylation reference datasets (ground truth) and cross-laboratory reproducibility assessment hinders clinical translation of epigenome-wide sequencing technologies. Using certified Quartet DNA reference materials, here we generate 108 epigenome-sequencing datasets across three mainstream protocols (whole-genome bisulfite sequencing, enzymatic methyl-seq, and TET-assisted pyridine borane sequencing) with triplicates per sample across laboratories. We observe strand-specific methylation biases across all protocols and libraries. Cross-laboratory reproducibility analyses reveal high quantitative methylation levels agreement (mean Pearson correlation coefficient (PCC) = 0.96) but low detection concordance (mean Jaccard index = 0.36). Using consensus voting, we construct genome-wide quantitative methylation reference datasets serving as ground truth for proficiency testing. Key technical parameters-including mean CpG depth, coverage, and strand consistency-correlate strongly with reference-dependent quality metrics (recall, PCC, and RMSE). Collectively, these resources establish foundational standards for benchmarking emerging epigenomic technologies and analytical pipelines, enabling robust, standardized quality control in research and clinical applications.\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":\"88 1\",\"pages\":\"9202\"},\"PeriodicalIF\":15.7000,\"publicationDate\":\"2025-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-025-64250-z\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-64250-z","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Methylation reference datasets from quartet DNA materials for benchmarking epigenome sequencing.
The lack of quantitative methylation reference datasets (ground truth) and cross-laboratory reproducibility assessment hinders clinical translation of epigenome-wide sequencing technologies. Using certified Quartet DNA reference materials, here we generate 108 epigenome-sequencing datasets across three mainstream protocols (whole-genome bisulfite sequencing, enzymatic methyl-seq, and TET-assisted pyridine borane sequencing) with triplicates per sample across laboratories. We observe strand-specific methylation biases across all protocols and libraries. Cross-laboratory reproducibility analyses reveal high quantitative methylation levels agreement (mean Pearson correlation coefficient (PCC) = 0.96) but low detection concordance (mean Jaccard index = 0.36). Using consensus voting, we construct genome-wide quantitative methylation reference datasets serving as ground truth for proficiency testing. Key technical parameters-including mean CpG depth, coverage, and strand consistency-correlate strongly with reference-dependent quality metrics (recall, PCC, and RMSE). Collectively, these resources establish foundational standards for benchmarking emerging epigenomic technologies and analytical pipelines, enabling robust, standardized quality control in research and clinical applications.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.