Mohannad Khandakji, Hind Hassan Ahmed Habish, Nawal Bakheet Salem Abdulla, Sitti Apsa Albani Kusasi, Nema Mahmoud Ghobashy Abdou, Hajer Mahmoud M A Al-Mulla, Reem Jawad A A Al Sulaiman, Salha M Bu Jassoum, Borbala Mifsud
{"title":"brca1特异性机器学习模型预测变异致病性准确性高。","authors":"Mohannad Khandakji, Hind Hassan Ahmed Habish, Nawal Bakheet Salem Abdulla, Sitti Apsa Albani Kusasi, Nema Mahmoud Ghobashy Abdou, Hajer Mahmoud M A Al-Mulla, Reem Jawad A A Al Sulaiman, Salha M Bu Jassoum, Borbala Mifsud","doi":"10.1152/physiolgenomics.00033.2023","DOIUrl":null,"url":null,"abstract":"<p><p>Identification of novel <i>BRCA1</i> variants outpaces their clinical annotation which highlights the importance of developing accurate computational methods for risk assessment. Therefore our aim was to develop a <i>BRCA1</i>-specific machine learning model to predict the pathogenicity of all types of <i>BRCA1</i> variants and to apply this model and our previous <i>BRCA2-</i>specific model to assess <i>BRCA</i> variants of uncertain significance (VUS) among Qatari patients with breast cancer. We developed an XGBoost model that utilizes variant information such as position frequency and consequence as well as prediction scores from numerous in silico tools. We trained and tested the model with <i>BRCA1</i> variants that were reviewed and classified by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. In addition we tested the model's performance on an independent set of missense variants of uncertain significance with experimentally determined functional scores. The model performed excellently in predicting the pathogenicity of ENIGMA-classified variants (accuracy: 99.9%) and in predicting the functional consequence of the independent set of missense variants (accuracy: 93.4%). Moreover it predicted 2 115 potentially pathogenic variants among the 31 058 unreviewed <i>BRCA1</i> variants in the <i>BRCA</i> exchange database. Using two <i>BRCA</i>-specific models we did not identify any pathogenic <i>BRCA1</i> variants among those found in patients in Qatar but predicted four potentially pathogenic <i>BRCA2</i> variants, which could be prioritized for functional validation.</p>","PeriodicalId":20129,"journal":{"name":"Physiological genomics","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10393322/pdf/","citationCount":"1","resultStr":"{\"title\":\"<i>BRCA1</i>-specific machine learning model predicts variant pathogenicity with high accuracy.\",\"authors\":\"Mohannad Khandakji, Hind Hassan Ahmed Habish, Nawal Bakheet Salem Abdulla, Sitti Apsa Albani Kusasi, Nema Mahmoud Ghobashy Abdou, Hajer Mahmoud M A Al-Mulla, Reem Jawad A A Al Sulaiman, Salha M Bu Jassoum, Borbala Mifsud\",\"doi\":\"10.1152/physiolgenomics.00033.2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identification of novel <i>BRCA1</i> variants outpaces their clinical annotation which highlights the importance of developing accurate computational methods for risk assessment. Therefore our aim was to develop a <i>BRCA1</i>-specific machine learning model to predict the pathogenicity of all types of <i>BRCA1</i> variants and to apply this model and our previous <i>BRCA2-</i>specific model to assess <i>BRCA</i> variants of uncertain significance (VUS) among Qatari patients with breast cancer. We developed an XGBoost model that utilizes variant information such as position frequency and consequence as well as prediction scores from numerous in silico tools. We trained and tested the model with <i>BRCA1</i> variants that were reviewed and classified by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. In addition we tested the model's performance on an independent set of missense variants of uncertain significance with experimentally determined functional scores. The model performed excellently in predicting the pathogenicity of ENIGMA-classified variants (accuracy: 99.9%) and in predicting the functional consequence of the independent set of missense variants (accuracy: 93.4%). Moreover it predicted 2 115 potentially pathogenic variants among the 31 058 unreviewed <i>BRCA1</i> variants in the <i>BRCA</i> exchange database. Using two <i>BRCA</i>-specific models we did not identify any pathogenic <i>BRCA1</i> variants among those found in patients in Qatar but predicted four potentially pathogenic <i>BRCA2</i> variants, which could be prioritized for functional validation.</p>\",\"PeriodicalId\":20129,\"journal\":{\"name\":\"Physiological genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10393322/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physiological genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1152/physiolgenomics.00033.2023\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CELL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physiological genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1152/physiolgenomics.00033.2023","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
BRCA1-specific machine learning model predicts variant pathogenicity with high accuracy.
Identification of novel BRCA1 variants outpaces their clinical annotation which highlights the importance of developing accurate computational methods for risk assessment. Therefore our aim was to develop a BRCA1-specific machine learning model to predict the pathogenicity of all types of BRCA1 variants and to apply this model and our previous BRCA2-specific model to assess BRCA variants of uncertain significance (VUS) among Qatari patients with breast cancer. We developed an XGBoost model that utilizes variant information such as position frequency and consequence as well as prediction scores from numerous in silico tools. We trained and tested the model with BRCA1 variants that were reviewed and classified by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. In addition we tested the model's performance on an independent set of missense variants of uncertain significance with experimentally determined functional scores. The model performed excellently in predicting the pathogenicity of ENIGMA-classified variants (accuracy: 99.9%) and in predicting the functional consequence of the independent set of missense variants (accuracy: 93.4%). Moreover it predicted 2 115 potentially pathogenic variants among the 31 058 unreviewed BRCA1 variants in the BRCA exchange database. Using two BRCA-specific models we did not identify any pathogenic BRCA1 variants among those found in patients in Qatar but predicted four potentially pathogenic BRCA2 variants, which could be prioritized for functional validation.
期刊介绍:
The Physiological Genomics publishes original papers, reviews and rapid reports in a wide area of research focused on uncovering the links between genes and physiology at all levels of biological organization. Articles on topics ranging from single genes to the whole genome and their links to the physiology of humans, any model organism, organ, tissue or cell are welcome. Areas of interest include complex polygenic traits preferably of importance to human health and gene-function relationships of disease processes. Specifically, the Journal has dedicated Sections focused on genome-wide association studies (GWAS) to function, cardiovascular, renal, metabolic and neurological systems, exercise physiology, pharmacogenomics, clinical, translational and genomics for precision medicine, comparative and statistical genomics and databases. For further details on research themes covered within these Sections, please refer to the descriptions given under each Section.