Justin Wagner, Nathan D. Olson, Jennifer McDaniel, Lindsay Harris, Brendan J. Pinto, David Jáspez, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, José M. Lorenzo-Salazar, Carlos Flores, Sayed Mohammad Ebrahim Sahraeian, Giuseppe Narzisi, Marta Byrska-Bishop, Uday S. Evani, Chunlin Xiao, Juniper A. Lake, Peter Fontana, Craig Greenberg, Donald Freed, Mohammed Faizal Eeman Mootor, Paul C. Boutros, Lisa Murray, Kishwar Shafin, Andrew Carroll, Fritz J. Sedlazeck, Melissa Wilson, Justin M. Zook
{"title":"来自X和Y染色体完整组装的小变异基准","authors":"Justin Wagner, Nathan D. Olson, Jennifer McDaniel, Lindsay Harris, Brendan J. Pinto, David Jáspez, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, José M. Lorenzo-Salazar, Carlos Flores, Sayed Mohammad Ebrahim Sahraeian, Giuseppe Narzisi, Marta Byrska-Bishop, Uday S. Evani, Chunlin Xiao, Juniper A. Lake, Peter Fontana, Craig Greenberg, Donald Freed, Mohammed Faizal Eeman Mootor, Paul C. Boutros, Lisa Murray, Kishwar Shafin, Andrew Carroll, Fritz J. Sedlazeck, Melissa Wilson, Justin M. Zook","doi":"10.1038/s41467-024-55710-z","DOIUrl":null,"url":null,"abstract":"<p>The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"22 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Small variant benchmark from a complete assembly of X and Y chromosomes\",\"authors\":\"Justin Wagner, Nathan D. Olson, Jennifer McDaniel, Lindsay Harris, Brendan J. Pinto, David Jáspez, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, José M. Lorenzo-Salazar, Carlos Flores, Sayed Mohammad Ebrahim Sahraeian, Giuseppe Narzisi, Marta Byrska-Bishop, Uday S. Evani, Chunlin Xiao, Juniper A. Lake, Peter Fontana, Craig Greenberg, Donald Freed, Mohammed Faizal Eeman Mootor, Paul C. Boutros, Lisa Murray, Kishwar Shafin, Andrew Carroll, Fritz J. Sedlazeck, Melissa Wilson, Justin M. Zook\",\"doi\":\"10.1038/s41467-024-55710-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.</p>\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":15.7000,\"publicationDate\":\"2025-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-024-55710-z\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-55710-z","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Small variant benchmark from a complete assembly of X and Y chromosomes
The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.