Stefan T Stafie, Mark Lindquist, Samuel Kusher-Lenhoff, Kenji Nakamichi, Debarshi Mustafi
{"title":"基因组参考和群体数据集对遗传性视网膜疾病致病变异的注释和优先排序的重要性。","authors":"Stefan T Stafie, Mark Lindquist, Samuel Kusher-Lenhoff, Kenji Nakamichi, Debarshi Mustafi","doi":"10.1080/13816810.2025.2544639","DOIUrl":null,"url":null,"abstract":"<p><p>In an era of expanding sequencing technologies, increased variant identification requires assignment of potential functional impact to prioritize those that may be disease-causing. In this data note, we demonstrate the importance of using a refined human genome reference assembly and more diverse and curated population-based databases in guiding functional annotation of variants identified in inherited retinal disease (IRD) genes. We compared variant characteristics extracted from Genome Aggregation Database (gnomAD) population data extracted for 372 IRD disease genes from versions 3.1.2 (v3) and 4.1.0 (v4), which are aligned to the most recent Genome Reference Consortium Human Build 38 (GRCh38) as well as version 2.1.1 (v2), aligned to the previous GRCh37 build. Transformation of the Variant Effector Prediction (VEP), Combined Annotation Dependent Depletion (CADD) scores, and ClinVar pathogenicity annotations were used to generate receiver-operating characteristic (ROC) curves to calculate area under the curve (AUC) and area under the precision-recall curve (AUPRC). Comparisons of variant prediction by ClinVar designation showed that with improved functional annotation, the AUC climbs to 0.99 and AUPRC is 0.98 in differentiating pathogenic variants from nonpathogenic when using the most recent genome build and population database. More diverse population data allow for identification of rare variants and the incorporation of variant annotation metrics provides greater insight into pathogenicity parameters of IRD variants. This data note provides empirical evidence to adopt the newest genomic builds and databases to better prioritize variants as potentially disease-causing for more complete molecular diagnosis in IRD patients.</p>","PeriodicalId":19594,"journal":{"name":"Ophthalmic Genetics","volume":" ","pages":"1-7"},"PeriodicalIF":1.0000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Importance of genome reference and population datasets for annotation and prioritization of disease-causing variants in inherited retinal diseases.\",\"authors\":\"Stefan T Stafie, Mark Lindquist, Samuel Kusher-Lenhoff, Kenji Nakamichi, Debarshi Mustafi\",\"doi\":\"10.1080/13816810.2025.2544639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In an era of expanding sequencing technologies, increased variant identification requires assignment of potential functional impact to prioritize those that may be disease-causing. In this data note, we demonstrate the importance of using a refined human genome reference assembly and more diverse and curated population-based databases in guiding functional annotation of variants identified in inherited retinal disease (IRD) genes. We compared variant characteristics extracted from Genome Aggregation Database (gnomAD) population data extracted for 372 IRD disease genes from versions 3.1.2 (v3) and 4.1.0 (v4), which are aligned to the most recent Genome Reference Consortium Human Build 38 (GRCh38) as well as version 2.1.1 (v2), aligned to the previous GRCh37 build. Transformation of the Variant Effector Prediction (VEP), Combined Annotation Dependent Depletion (CADD) scores, and ClinVar pathogenicity annotations were used to generate receiver-operating characteristic (ROC) curves to calculate area under the curve (AUC) and area under the precision-recall curve (AUPRC). Comparisons of variant prediction by ClinVar designation showed that with improved functional annotation, the AUC climbs to 0.99 and AUPRC is 0.98 in differentiating pathogenic variants from nonpathogenic when using the most recent genome build and population database. More diverse population data allow for identification of rare variants and the incorporation of variant annotation metrics provides greater insight into pathogenicity parameters of IRD variants. This data note provides empirical evidence to adopt the newest genomic builds and databases to better prioritize variants as potentially disease-causing for more complete molecular diagnosis in IRD patients.</p>\",\"PeriodicalId\":19594,\"journal\":{\"name\":\"Ophthalmic Genetics\",\"volume\":\" \",\"pages\":\"1-7\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmic Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/13816810.2025.2544639\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmic Genetics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/13816810.2025.2544639","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Importance of genome reference and population datasets for annotation and prioritization of disease-causing variants in inherited retinal diseases.
In an era of expanding sequencing technologies, increased variant identification requires assignment of potential functional impact to prioritize those that may be disease-causing. In this data note, we demonstrate the importance of using a refined human genome reference assembly and more diverse and curated population-based databases in guiding functional annotation of variants identified in inherited retinal disease (IRD) genes. We compared variant characteristics extracted from Genome Aggregation Database (gnomAD) population data extracted for 372 IRD disease genes from versions 3.1.2 (v3) and 4.1.0 (v4), which are aligned to the most recent Genome Reference Consortium Human Build 38 (GRCh38) as well as version 2.1.1 (v2), aligned to the previous GRCh37 build. Transformation of the Variant Effector Prediction (VEP), Combined Annotation Dependent Depletion (CADD) scores, and ClinVar pathogenicity annotations were used to generate receiver-operating characteristic (ROC) curves to calculate area under the curve (AUC) and area under the precision-recall curve (AUPRC). Comparisons of variant prediction by ClinVar designation showed that with improved functional annotation, the AUC climbs to 0.99 and AUPRC is 0.98 in differentiating pathogenic variants from nonpathogenic when using the most recent genome build and population database. More diverse population data allow for identification of rare variants and the incorporation of variant annotation metrics provides greater insight into pathogenicity parameters of IRD variants. This data note provides empirical evidence to adopt the newest genomic builds and databases to better prioritize variants as potentially disease-causing for more complete molecular diagnosis in IRD patients.
期刊介绍:
Ophthalmic Genetics accepts original papers, review articles and short communications on the clinical and molecular genetic aspects of ocular diseases.