Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY
PLoS Genetics Pub Date : 2025-01-06 eCollection Date: 2025-01-01 DOI:10.1371/journal.pgen.1011540
Bushra Haque, David Cheerie, Amy Pan, Meredith Curtis, Thomas Nalpathamkalam, Jimmy Nguyen, Celine Salhab, Bhooma Thiruvahindrapuram, Jade Zhang, Madeline Couse, Taila Hartley, Michelle M Morrow, E Magda Price, Susan Walker, David Malkin, Frederick P Roth, Gregory Costain
{"title":"Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.","authors":"Bushra Haque, David Cheerie, Amy Pan, Meredith Curtis, Thomas Nalpathamkalam, Jimmy Nguyen, Celine Salhab, Bhooma Thiruvahindrapuram, Jade Zhang, Madeline Couse, Taila Hartley, Michelle M Morrow, E Magda Price, Susan Walker, David Malkin, Frederick P Roth, Gregory Costain","doi":"10.1371/journal.pgen.1011540","DOIUrl":null,"url":null,"abstract":"<p><p>Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.</p>","PeriodicalId":49007,"journal":{"name":"PLoS Genetics","volume":"21 1","pages":"e1011540"},"PeriodicalIF":4.0000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737861/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pgen.1011540","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.

利用癌症突变数据为种系错义变异的致病性分类提供信息。
需要创新和易于实施的策略来提高罕见种系错义变异的致病性评估。通过大规模肿瘤测序研究确定的体细胞癌驱动突变通常影响与罕见孟德尔疾病相关的基因。使用癌症突变数据来帮助解释种系错义变异,无论该基因是否与遗传性癌症易感性综合征或非癌症相关发育障碍相关,都没有得到系统的评估。我们从cancer hotspot数据库中提取了假定的癌症驱动错义突变,并将其注释为种系变异,包括ClinVar中的存在/缺失和分类。我们使用Cancer Hotspot数据(训练数据集)训练了两个监督学习模型(逻辑回归和随机森林)来预测ClinVar种系错义变异的变异分类。每个模型的性能都使用独立的测试数据集进行评估,该测试数据集部分来自于搜索来自约150万个个体的公共和私人全基因组测序数据集。在2447个癌症突变中,691个相应的种系变异先前在ClinVar中被分类:426个(61.6%)为可能致病性/致病性,261个(37.8%)为不确定意义,4个(0.6%)为可能良性/良性。与相同216个基因中所有其他种系错义变异相比,ClinVar中可能的致病性/致病性分类的优势比为28.3(95%置信区间:24.2-33.1,p < 0.001)。两种监督学习模型都显示出与训练数据集中的致病性评估高度相关。应用于测试数据集时,逻辑回归模型和随机森林模型的精密度-召回率曲线值下的面积(0.847和0.829)和接收者-操作特征曲线值下的面积(0.821和0.774)分别很高。通过使用癌症和种系数据集和监督学习技术,我们的研究表明,癌症突变数据可以用来改善对可能导致罕见孟德尔疾病的种系错义变异的解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS Genetics
PLoS Genetics GENETICS & HEREDITY-
自引率
2.20%
发文量
438
期刊介绍: PLOS Genetics is run by an international Editorial Board, headed by the Editors-in-Chief, Greg Barsh (HudsonAlpha Institute of Biotechnology, and Stanford University School of Medicine) and Greg Copenhaver (The University of North Carolina at Chapel Hill). Articles published in PLOS Genetics are archived in PubMed Central and cited in PubMed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信