{"title":"预测蛋白质核酸复合物的精度:建立基准数据集和比较指标。","authors":"Huizi Cui, , , Yuxuan Wang, , , Yu Fu, , , Xiangyu Yu, , , Wannan Li, , , Feng Lin*, , and , Weiwei Han*, ","doi":"10.1021/acs.jcim.5c01372","DOIUrl":null,"url":null,"abstract":"<p >Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 18","pages":"9654–9671"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Precision in Predicting Protein–Nucleic Acid Complexes: Establishing a Benchmark Data Set and Comparative Metrics\",\"authors\":\"Huizi Cui, , , Yuxuan Wang, , , Yu Fu, , , Xiangyu Yu, , , Wannan Li, , , Feng Lin*, , and , Weiwei Han*, \",\"doi\":\"10.1021/acs.jcim.5c01372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 18\",\"pages\":\"9654–9671\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01372\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01372","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
Precision in Predicting Protein–Nucleic Acid Complexes: Establishing a Benchmark Data Set and Comparative Metrics
Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.