Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

IF 3.8 3区医学 Q2 GENETICS & HEREDITY

Human Genomics Pub Date : 2024-04-29 DOI:10.1186/s40246-024-00604-w

Sarah L. Stenton, Melanie C. O’Leary, Gabrielle Lemire, Grace E. VanNoy, Stephanie DiTroia, Vijay S. Ganesh, Emily Groopman, Emily O’Heir, Brian Mangilog, Ikeoluwa Osei-Owusu, Lynn S. Pais, Jillian Serrano, Moriel Singer-Berk, Ben Weisburd, Michael W. Wilson, Christina Austin-Tse, Marwa Abdelhakim, Azza Althagafi, Giulia Babbi, Riccardo Bellazzi, Samuele Bovo, Maria Giulia Carta, Rita Casadio, Pieter-Jan Coenen, Federica De Paoli, Matteo Floris, Manavalan Gajapathy, Robert Hoehndorf, Julius O. B. Jacobsen, Thomas Joseph, Akash Kamandula, Panagiotis Katsonis, Cyrielle Kint, Olivier Lichtarge, Ivan Limongelli, Yulan Lu, Paolo Magni, Tarun Karthik Kumar Mamidi, Pier Luigi Martelli, Marta Mulargia, Giovanna Nicora, Keith Nykamp, Vikas Pejaver, Yisu Peng, Thi Hong Cam Pham, Maurizio S. Podda, Aditya Rao, Ettore Rizzo, Vangala G. Saipradeep, Castrense Savojardo, Peter Schols, Yang Shen, Naveen Sivadasan, Damian Smedley, Dorian Soru, Rajgopal Srinivasan, Yuanfei Sun, Uma Sunderam, W..

{"title":"Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project","authors":"Sarah L. Stenton, Melanie C. O’Leary, Gabrielle Lemire, Grace E. VanNoy, Stephanie DiTroia, Vijay S. Ganesh, Emily Groopman, Emily O’Heir, Brian Mangilog, Ikeoluwa Osei-Owusu, Lynn S. Pais, Jillian Serrano, Moriel Singer-Berk, Ben Weisburd, Michael W. Wilson, Christina Austin-Tse, Marwa Abdelhakim, Azza Althagafi, Giulia Babbi, Riccardo Bellazzi, Samuele Bovo, Maria Giulia Carta, Rita Casadio, Pieter-Jan Coenen, Federica De Paoli, Matteo Floris, Manavalan Gajapathy, Robert Hoehndorf, Julius O. B. Jacobsen, Thomas Joseph, Akash Kamandula, Panagiotis Katsonis, Cyrielle Kint, Olivier Lichtarge, Ivan Limongelli, Yulan Lu, Paolo Magni, Tarun Karthik Kumar Mamidi, Pier Luigi Martelli, Marta Mulargia, Giovanna Nicora, Keith Nykamp, Vikas Pejaver, Yisu Peng, Thi Hong Cam Pham, Maurizio S. Podda, Aditya Rao, Ettore Rizzo, Vangala G. Saipradeep, Castrense Savojardo, Peter Schols, Yang Shen, Naveen Sivadasan, Damian Smedley, Dorian Soru, Rajgopal Srinivasan, Yuanfei Sun, Uma Sunderam, W..","doi":"10.1186/s40246-024-00604-w","DOIUrl":null,"url":null,"abstract":"A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average \"diagnostic odyssey\" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-024-00604-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

查看原文本刊更多论文

对罕见基因组项目中用于罕见病诊断的变异体优先排序方法进行严格评估

罕见病家庭面临的一个主要障碍是获得基因诊断。平均 "诊断之旅 "要持续 5 年以上，即使在全基因组范围内捕捉变异，也只有不到 50% 的病因变异能被识别出来。为了帮助解释和优先处理检测到的大量变异，计算方法层出不穷。但哪些工具最有效仍不清楚。为了评估计算方法的性能并鼓励方法开发的创新，我们设计了基因组解读关键评估（CAGI）社区挑战赛，在真实的临床诊断环境中对变异优先级模型进行正面交锋。我们利用的基因组测序（GS）数据来自罕见基因组计划（RGP）中的测序家系，该计划是一项直接面向参与者的研究，旨在了解基因组测序在罕见病诊断和基因发现中的应用。我们向挑战预测者提供了来自 175 个 RGP 个体（65 个家系）的变异调用和表型术语数据集，其中包括 35 个指定了因果变异的已解决训练集家系和 30 个未标记测试集家系（14 个已解决，16 个未解决）。我们要求各小组在尽可能多的家系中找出因果变异体。预测者提交带有估计因果关系概率 (EPCR) 值的变异预测。模型性能由两个指标决定，一个是基于因果变异体排名位置的加权得分，另一个是基于所有 EPCR 值中因果变异体的精确度和召回率的最大 F 度量。16 个团队提交了 52 个模型的预测结果，其中一些模型还加入了人工审核。在排名前 5 位的变异中，表现最出色的团队召回了 14 个已解家系中多达 13 个家系的因果变异。在对 RNA 测序进行确证后，新发现的诊断变体被送回两个以前未解决的家系，两个新的候选疾病基因被输入到 Matchmaker Exchange 中。其中一个例子是，RNA 测序显示 ASNS 中的一个深内含子缺失导致剪接异常，在一个表型与天冬酰胺合成酶缺乏症一致的未解谜探查者中与一个框架移位变异反式确定。模型方法和性能差异很大。权衡调用质量、等位基因频率、预测缺失性、分离和表型的模型能有效地识别因果变异，而对表型扩展和非编码变异开放的模型则能捕捉更困难的诊断并发现新的诊断。总之，计算模型能极大地帮助确定变异的优先次序。在诊断中使用时，需要根据既定标准对优先级变异进行详细审查和保守评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human Genomics GENETICS & HEREDITY-

CiteScore

6.00

自引率

2.20%

发文量

审稿时长

11 weeks

期刊介绍： Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics. Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.