Unveiling the pathogenicity of allosteric protein mutations via multifaceted feature ensembling

IF 4.3 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Huiling Zhang , Xijian Li , Junwen Huang , Yuetong Li , Shaozhen Cai , Haiyan Wang , Yanjie Wei
{"title":"Unveiling the pathogenicity of allosteric protein mutations via multifaceted feature ensembling","authors":"Huiling Zhang ,&nbsp;Xijian Li ,&nbsp;Junwen Huang ,&nbsp;Yuetong Li ,&nbsp;Shaozhen Cai ,&nbsp;Haiyan Wang ,&nbsp;Yanjie Wei","doi":"10.1016/j.ymeth.2025.07.014","DOIUrl":null,"url":null,"abstract":"<div><div>Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"244 ","pages":"Pages 82-91"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325001884","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.
通过多面特征集合揭示变构蛋白突变的致病性。
变构蛋白在生物过程和系统中起着核心作用。确定突变对变构蛋白的生物学影响及其在疾病发生和进展期间影响的表型是一项重大挑战。理论上,计算方法有可能促进大规模解释变构蛋白的遗传变异。然而,一般的变异效应预测(VEP)方法忽略了不同基因之间的特征差异。更重要的是,单个工具在质量上经常表现出分歧、偏差和变化。因此,从当前VEP方法得出的预测被认为不够可靠。在这项研究中,我们开发了一种预测变构蛋白错义突变致病性的综合方法。该方法采用分类增强方法,将序列信息、alphafold2提取的生化特性、其他VEP方法的预测分数和gnomAD的等位基因频率四类特征进行整合。在一个基准变构蛋白数据集上,该方法的AUC为0.912,优于22种通用的VEP方法。随着测序研究的大规模扩展,为了便于在发现的罕见变异海洋中识别致病突变,我们提供了202个变构蛋白编码基因中所有潜在氨基酸替换的致病概率。总之,我们的工作表明,来自集成学习的模型可以为解释变构蛋白的错义突变提供有价值的独立证据,这将在研究和临床场景中广泛使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信