{"title":"Alternative Conformation Prediction Using Deep Learning With Multi-MSA Strategy and Structural Clustering in CASP16.","authors":"Qiqige Wuyun, Quancheng Liu, Wentao Ni, Chunxiang Peng, Ziying Zhang, Xiaogen Zhou, Gang Hu, Lydia Freddolino, Wei Zheng","doi":"10.1002/prot.70059","DOIUrl":null,"url":null,"abstract":"<p><p>We report the results from the \"MIEnsembles-Server\" and \"Zheng\" groups for structure ensemble predictions in CASP16, both of which employed the EnsembleFold pipeline. Initially, multiple sequence alignments (MSAs) were generated using DeepMSA2 for proteins and rMSA for RNA targets. These MSAs were processed by newly developed deep learning methods-D-I-TASSER2 for protein monomer structure prediction, DMFold2 for protein complex structure prediction, ExFold for RNA structure prediction, and DeepProtNA for protein-nucleic acid complex structure prediction-to yield diverse structural decoys. The generated decoys were clustered into representative models corresponding to distinct conformational states using the structural clustering tool MolClust. Protein monomer targets underwent additional refinement via replica-exchange Monte Carlo (REMC) simulations with D-I-TASSER2, and these refined decoys were re-clustered with MolClust to finalize the ensemble predictions. For the 19 ensemble targets in CASP16, the final EnsembleFold models achieved an average TM-score of 0.657, representing improvements of 10.2% compared to the baseline AlphaFold3 program. Notably, EnsembleFold achieved particularly good performance for hybrid protein/nucleic-acid targets, leading to its efficacy in ensemble prediction tasks. Analysis of the resulting structural ensembles highlighted three significant insights: (i) Models derived from distinct DeepMSA2-generated MSAs typically represent different conformational states for ensemble targets; (ii) REMC simulations significantly enhance model diversity, facilitating the identification of alternative conformations; (iii) The structural clustering approach effectively identifies and selects accurate representative models for each conformational state. We further discuss potential improvements in Quality Assessment (QA) scoring methods that could further enhance the reliability and accuracy of ensemble predictions in the future.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70059","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We report the results from the "MIEnsembles-Server" and "Zheng" groups for structure ensemble predictions in CASP16, both of which employed the EnsembleFold pipeline. Initially, multiple sequence alignments (MSAs) were generated using DeepMSA2 for proteins and rMSA for RNA targets. These MSAs were processed by newly developed deep learning methods-D-I-TASSER2 for protein monomer structure prediction, DMFold2 for protein complex structure prediction, ExFold for RNA structure prediction, and DeepProtNA for protein-nucleic acid complex structure prediction-to yield diverse structural decoys. The generated decoys were clustered into representative models corresponding to distinct conformational states using the structural clustering tool MolClust. Protein monomer targets underwent additional refinement via replica-exchange Monte Carlo (REMC) simulations with D-I-TASSER2, and these refined decoys were re-clustered with MolClust to finalize the ensemble predictions. For the 19 ensemble targets in CASP16, the final EnsembleFold models achieved an average TM-score of 0.657, representing improvements of 10.2% compared to the baseline AlphaFold3 program. Notably, EnsembleFold achieved particularly good performance for hybrid protein/nucleic-acid targets, leading to its efficacy in ensemble prediction tasks. Analysis of the resulting structural ensembles highlighted three significant insights: (i) Models derived from distinct DeepMSA2-generated MSAs typically represent different conformational states for ensemble targets; (ii) REMC simulations significantly enhance model diversity, facilitating the identification of alternative conformations; (iii) The structural clustering approach effectively identifies and selects accurate representative models for each conformational state. We further discuss potential improvements in Quality Assessment (QA) scoring methods that could further enhance the reliability and accuracy of ensemble predictions in the future.
期刊介绍:
PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.