Huiling Zhang , Xijian Li , Junwen Huang , Yuetong Li , Shaozhen Cai , Haiyan Wang , Yanjie Wei
{"title":"通过多面特征集合揭示变构蛋白突变的致病性。","authors":"Huiling Zhang , Xijian Li , Junwen Huang , Yuetong Li , Shaozhen Cai , Haiyan Wang , Yanjie Wei","doi":"10.1016/j.ymeth.2025.07.014","DOIUrl":null,"url":null,"abstract":"<div><div>Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"244 ","pages":"Pages 82-91"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unveiling the pathogenicity of allosteric protein mutations via multifaceted feature ensembling\",\"authors\":\"Huiling Zhang , Xijian Li , Junwen Huang , Yuetong Li , Shaozhen Cai , Haiyan Wang , Yanjie Wei\",\"doi\":\"10.1016/j.ymeth.2025.07.014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.</div></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"244 \",\"pages\":\"Pages 82-91\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202325001884\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325001884","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Unveiling the pathogenicity of allosteric protein mutations via multifaceted feature ensembling
Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.