Gene-specific pathogenicity predictor for chromatin remodeling BAF complex-associated neurodevelopmental disorders.

IF 3.6 Q2 GENETICS & HEREDITY
HGG Advances Pub Date : 2026-04-09 Epub Date: 2026-02-28 DOI:10.1016/j.xhgg.2026.100583
Joshua Hack, Mohammad Nazim
{"title":"Gene-specific pathogenicity predictor for chromatin remodeling BAF complex-associated neurodevelopmental disorders.","authors":"Joshua Hack, Mohammad Nazim","doi":"10.1016/j.xhgg.2026.100583","DOIUrl":null,"url":null,"abstract":"<p><p>Advancements in whole-genome sequencing have increased the number of variants of uncertain significance (VUS) identified in human genomes. This has created a diagnostic bottleneck for genetic counselors tasked with sifting through these variants and determining those most likely to be causative for a patient's clinical presentation. Machine learning (ML) tools can aid in identifying pathogenic variants from VUS, but there is a need for gene-specific algorithms that predict pathogenic variants with high accuracy. To address this need, we present a workflow for developing gene-specific, ensemble-learning ML tools, that leverage outputs from other algorithms, locations of variants within the gene, and evolutionary conservation data to make a prediction of pathogenicity. Variants in SMARCA2 and SMARCA4 that are associated with rare neurodevelopmental diseases were used to screen 15 ML algorithms. A random forest learner was tuned to yield a final accuracy of 0.93 on holdout data. Generalizing this predictor to other BRG1/BRM-associated factor (BAF) complex proteins resulted in a sharp decline in performance. We trained a final predictor for all genes in the study to create a predictor that identifies pathogenic variants in these BAF subunits with an accuracy of 0.91 on holdout data. This predictor specific to BAF complex proteins performs with higher accuracy and area under the precision-recall curve than any other predictor. The decline in performance when generalized to other proteins emphasizes the need for the gene-specific calibration of predictors. Our workflow for the development of such models provides a quick, computationally inexpensive route for improving the ML tools available to genetic counselors.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100583"},"PeriodicalIF":3.6000,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13000492/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HGG Advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xhgg.2026.100583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Advancements in whole-genome sequencing have increased the number of variants of uncertain significance (VUS) identified in human genomes. This has created a diagnostic bottleneck for genetic counselors tasked with sifting through these variants and determining those most likely to be causative for a patient's clinical presentation. Machine learning (ML) tools can aid in identifying pathogenic variants from VUS, but there is a need for gene-specific algorithms that predict pathogenic variants with high accuracy. To address this need, we present a workflow for developing gene-specific, ensemble-learning ML tools, that leverage outputs from other algorithms, locations of variants within the gene, and evolutionary conservation data to make a prediction of pathogenicity. Variants in SMARCA2 and SMARCA4 that are associated with rare neurodevelopmental diseases were used to screen 15 ML algorithms. A random forest learner was tuned to yield a final accuracy of 0.93 on holdout data. Generalizing this predictor to other BRG1/BRM-associated factor (BAF) complex proteins resulted in a sharp decline in performance. We trained a final predictor for all genes in the study to create a predictor that identifies pathogenic variants in these BAF subunits with an accuracy of 0.91 on holdout data. This predictor specific to BAF complex proteins performs with higher accuracy and area under the precision-recall curve than any other predictor. The decline in performance when generalized to other proteins emphasizes the need for the gene-specific calibration of predictors. Our workflow for the development of such models provides a quick, computationally inexpensive route for improving the ML tools available to genetic counselors.

染色质重塑BAF复合物相关神经发育障碍的基因特异性致病性预测因子。
全基因组测序的进步增加了在人类基因组中发现的不确定意义变异(VUS)的数量。这给遗传咨询师造成了诊断瓶颈,他们的任务是筛选这些变异,并确定那些最有可能导致患者临床表现的变异。机器学习(ML)工具可以帮助识别VUS的致病变异,但需要基因特异性算法来高精度地预测致病变异。为了满足这一需求,我们提出了一种开发基因特异性、集成学习ML工具的工作流程,该工具利用其他算法的输出、基因内变异的位置和进化保护数据来预测致病性。与罕见神经发育疾病相关的SMARCA2和SMARCA4变异被用于筛选15ml算法。对随机森林学习器进行了调整,使其在holdout数据上的最终准确率达到0.93。将这一预测推广到其他BAF复合物蛋白导致性能急剧下降。我们训练了研究中所有基因的最终预测器,以创建一个预测器,识别这些BAF亚基的致病变异,在保留数据上的准确性为0.91。与其他预测器相比,该预测器对BAF复合物蛋白具有更高的准确性和AUPRC。当推广到其他蛋白质时,性能的下降强调需要对预测因子进行基因特异性校准。我们开发此类模型的工作流程为改进遗传咨询师可用的ML工具提供了快速,计算成本低廉的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
HGG Advances
HGG Advances Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
4.30
自引率
4.50%
发文量
69
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书