codock -2:通过结合集成和多模型特征选择方法的混合特征选择增强盲对接性能。

IF 3.1 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Sadettin Y Ugurlu
{"title":"codock -2:通过结合集成和多模型特征选择方法的混合特征选择增强盲对接性能。","authors":"Sadettin Y Ugurlu","doi":"10.1007/s10822-025-00629-w","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD <math><mo>≤</mo></math> 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( <math><mrow><mtext>p</mtext> <mo><</mo> <mn>0.05</mn></mrow> </math> ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 1","pages":"48"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CoBdock-2: enhancing blind docking performance through hybrid feature selection combining ensemble and multimodel feature selection approaches.\",\"authors\":\"Sadettin Y Ugurlu\",\"doi\":\"10.1007/s10822-025-00629-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD <math><mo>≤</mo></math> 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( <math><mrow><mtext>p</mtext> <mo><</mo> <mn>0.05</mn></mrow> </math> ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.</p>\",\"PeriodicalId\":621,\"journal\":{\"name\":\"Journal of Computer-Aided Molecular Design\",\"volume\":\"39 1\",\"pages\":\"48\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer-Aided Molecular Design\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s10822-025-00629-w\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10822-025-00629-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

确定正位结合位点和预测小分子亲和力仍然是虚拟筛选的关键挑战。虽然盲对接探索整个蛋白质表面,但其精度受到巨大搜索空间的阻碍。空腔检测引导对接通过将焦点缩小到预测的空腔来提高精度,但其有效性在很大程度上取决于空腔检测工具的质量。为了克服这些限制,我们开发了共识盲对接(CoBDock),这是一种基于机器学习的盲对接方法,将分子对接和空腔检测结果集成在一起,以增强结合位点和位姿预测。在此基础上,codock -2通过从蛋白质、配体和相互作用结构特征中提取一维数值表示,并应用先进的集成特征选择技术,取代了传统的对接工具。通过评估21种特征选择方法,共9,598个特征,codock -2确定了正位结合位点的关键分子特征。codock -2在基准数据集(PDBBind v2020-general、MTi、ADS、ddu - e、CASF-2016)上比原始codock表现出一致的改进,达到77%的结合位点识别精度(在8 Å以内),55%的配体姿态预测精度(RMSD≤2 Å),结合位点内与基本真实配体的平均距离减少19%,平均姿态RMSD减少18.5%。综合基准集的统计分析证实了这些改进的显著性(p < 0.05)。值得注意的是,codock -2中的加权混合特征选择变体进一步提高了结合位点的准确性,达到79.8%,证明了多模型和集成特征选择策略相结合的好处。预测的可变性也显著降低,突出了可靠性和普遍性的增强。此外,与最先进的DiffDock + NMDN方法进行了低偏差假设比较,以定位codock -2相对于现代基于深度学习的对接策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CoBdock-2: enhancing blind docking performance through hybrid feature selection combining ensemble and multimodel feature selection approaches.

Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( p < 0.05 ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computer-Aided Molecular Design
Journal of Computer-Aided Molecular Design 生物-计算机:跨学科应用
CiteScore
8.00
自引率
8.60%
发文量
56
审稿时长
3 months
期刊介绍: The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas: - theoretical chemistry; - computational chemistry; - computer and molecular graphics; - molecular modeling; - protein engineering; - drug design; - expert systems; - general structure-property relationships; - molecular dynamics; - chemical database development and usage.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信