预测蛋白质核酸复合物的精度：建立基准数据集和比较指标。

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-09-11 DOI:10.1021/acs.jcim.5c01372

Huizi Cui, , , Yuxuan Wang, , , Yu Fu, , , Xiangyu Yu, , , Wannan Li, , , Feng Lin*, , and , Weiwei Han*,

{"title":"预测蛋白质核酸复合物的精度：建立基准数据集和比较指标。","authors":"Huizi Cui, , , Yuxuan Wang, , , Yu Fu, , , Xiangyu Yu, , , Wannan Li, , , Feng Lin*, , and , Weiwei Han*, ","doi":"10.1021/acs.jcim.5c01372","DOIUrl":null,"url":null,"abstract":"<p >Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 18","pages":"9654–9671"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Precision in Predicting Protein–Nucleic Acid Complexes: Establishing a Benchmark Data Set and Comparative Metrics\",\"authors\":\"Huizi Cui, , , Yuxuan Wang, , , Yu Fu, , , Xiangyu Yu, , , Wannan Li, , , Feng Lin*, , and , Weiwei Han*, \",\"doi\":\"10.1021/acs.jcim.5c01372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 18\",\"pages\":\"9654–9671\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01372\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01372","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质-核酸相互作用是生物过程和生物技术的基础，但它们的计算预测落后于蛋白质结构或蛋白质-蛋白质相互作用建模。本研究引入了ProNASet，这是一个由100个实验分解的蛋白质-核酸复合物结构组成的基准数据集，以及一个使用均方根偏差（RMSD）、tm评分和局部距离差检验（LDDT）指标的多维评估框架。我们系统地评估了四种深度学习（DL）算法（AlphaFold3、cai -1、HelixFold3和Protenix）和两种物理驱动对接方法（HDOCK和HDOCK_NT）。我们的分析表明，物理驱动的方法在预测蛋白质核酸复合物结构方面明显优于当前的DL方法。无模板的HDOCK_NT显示出最高的成功率为74.5%（使用阈值RMSD 0.9, LDDT >0.6），而模板对接的成功率为63.8%，而性能最好的DL方法AlphaFold3的成功率仅为34.0%。这些结果强调了在此特定任务中改进DL方法的必要性。ProNASet基准提供了一个标准化的测试平台，突出了当前DL模型在捕获蛋白质-核酸相互作用特征方面的内在缺陷，并指导下一代计算工具的开发，这些工具对推进基因组编辑和合成生物学至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Precision in Predicting Protein–Nucleic Acid Complexes: Establishing a Benchmark Data Set and Comparative Metrics

查看原文本刊更多论文

Precision in Predicting Protein–Nucleic Acid Complexes: Establishing a Benchmark Data Set and Comparative Metrics

Protein–nucleic acid interactions are fundamental to biological processes and biotechnology, yet their computational prediction lags behind protein structure or protein–protein interaction modeling. This study introduces ProNASet, a benchmark data set of 100 experimentally resolved protein–nucleic acid complex structures, alongside a multidimensional evaluation framework using root mean square deviation (RMSD), TM-score, and local distance difference test (LDDT) metrics. We systematically evaluated four deep learning (DL) algorithms (AlphaFold3, Chai-1, HelixFold3, and Protenix) and two physically driven docking methods (HDOCK and HDOCK_NT). Our analysis revealed that physically driven methods significantly outperform current DL approaches in predicting protein–nucleic acid complex structures. The template-less HDOCK_NT demonstrated the highest success rate at 74.5% (using thresholds RMSD <2 Å, TM-score >0.9, and LDDT >0.6), compared to 63.8% for template docking and only 34.0% for the best-performing DL method, AlphaFold3. These results underscore the substantial need for improvement in DL methods for this specific task. The ProNASet benchmark provides a standardized testing platform, highlights intrinsic shortcomings in current DL models for capturing protein–nucleic acid interaction features, and guides the development of next-generation computational tools crucial for advancing genome editing and synthetic biology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.