AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes

IF 3.8 3区医学 Q2 GENETICS & HEREDITY

Human Genomics Pub Date : 2024-09-11 DOI:10.1186/s40246-024-00667-9

Rahaf M. Ahmad, Bassam R. Ali, Fatma Al-Jasmi, Noura Al Dhaheri, Saeed Al Turki, Praseetha Kizhakkedath, Mohd Saberi Mohamad

{"title":"AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes","authors":"Rahaf M. Ahmad, Bassam R. Ali, Fatma Al-Jasmi, Noura Al Dhaheri, Saeed Al Turki, Praseetha Kizhakkedath, Mohd Saberi Mohamad","doi":"10.1186/s40246-024-00667-9","DOIUrl":null,"url":null,"abstract":"Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":"16 1","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-024-00667-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.

查看原文本刊更多论文

对乳腺癌基因错义变异致病性预测工具性能的人工智能比较评估

单核苷酸变异（SNVs）可对各种细胞功能产生巨大且极其多变的影响，因此准确预测其后果具有挑战性，尽管这在肿瘤学等临床环境中至关重要。评估这些影响的实验室实验方法既耗时又不切实际，这凸显了用于变异影响预测的室内工具的重要性。然而，目前可用的工具对基准数据库中乳腺癌错义变异的性能指标尚未进行深入研究，这就造成了准确预测致病性方面的知识空白。本研究利用基准数据集 ClinVar 和 HGMD 评估了 21 种人工智能（AI）衍生的硅内工具。乳腺癌基因中的错义变异是从 ClinVar 和 HGMD professional v2023.1 中提取的。HGMD 数据集只关注致病变异，为确保平衡，ClinVar 数据库中也包含了相同基因的良性变异。有趣的是，我们对这两个数据集的分析都发现了不同基因的变异，除高穿透性外，还有低穿透性和中穿透性等不同程度的穿透性，这加强了特定疾病工具的价值。在 ClinVar 数据集上表现最好的工具是 MutPred（准确率为 0.73）、Meta-RNN（准确率为 0.72）、ClinPred（准确率为 0.71）、Meta-SVM、REVEL 和 Fathmm-XF（准确率为 0.70）。而在 HGMD 数据集上，它们分别是 ClinPred（准确率 = 0.72）、MetaRNN（准确率 = 0.71）、CADD（准确率 = 0.69）、Fathmm-MKL（准确率 = 0.68）和 Fathmm-XF（准确率 = 0.67）。这些发现为临床医生和研究人员选择、改进和开发有效的乳腺癌致病性预测硅内工具提供了宝贵的见解。弥合这一知识鸿沟有助于推进精准医疗，加强乳腺癌患者的诊断和治疗方法，对其他疾病也有潜在的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human Genomics GENETICS & HEREDITY-

CiteScore

6.00

自引率

2.20%

发文量

审稿时长

11 weeks

期刊介绍： Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics. Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.