{"title":"codock -2:通过结合集成和多模型特征选择方法的混合特征选择增强盲对接性能。","authors":"Sadettin Y Ugurlu","doi":"10.1007/s10822-025-00629-w","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD <math><mo>≤</mo></math> 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( <math><mrow><mtext>p</mtext> <mo><</mo> <mn>0.05</mn></mrow> </math> ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 1","pages":"48"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CoBdock-2: enhancing blind docking performance through hybrid feature selection combining ensemble and multimodel feature selection approaches.\",\"authors\":\"Sadettin Y Ugurlu\",\"doi\":\"10.1007/s10822-025-00629-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD <math><mo>≤</mo></math> 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( <math><mrow><mtext>p</mtext> <mo><</mo> <mn>0.05</mn></mrow> </math> ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.</p>\",\"PeriodicalId\":621,\"journal\":{\"name\":\"Journal of Computer-Aided Molecular Design\",\"volume\":\"39 1\",\"pages\":\"48\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer-Aided Molecular Design\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s10822-025-00629-w\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10822-025-00629-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
CoBdock-2: enhancing blind docking performance through hybrid feature selection combining ensemble and multimodel feature selection approaches.
Identifying orthosteric binding sites and predicting small molecule affinities remains a key challenge in virtual screening. While blind docking explores the entire protein surface, its precision is hindered by the vast search space. Cavity detection-guided docking improves accuracy by narrowing focus to predicted pockets, but its effectiveness depends heavily on the quality of cavity detection tools. To overcome these limitations, we developed Consensus Blind Dock (CoBDock), a machine learning-based blind docking method that integrates molecular docking and cavity detection results to enhance binding site and pose prediction. Building on this, CoBDock-2 replaces traditional docking tools by extracting 1D numerical representations from protein, ligand, and interaction structural features, and applying advanced ensemble feature selection techniques. By evaluating 21 feature selection methods across 9,598 features, CoBDock-2 identifies key molecular characteristics of orthosteric binding sites. CoBDock-2 demonstrates consistent improvements over the original CoBDock across benchmark datasets (PDBBind v2020-general, MTi, ADS, DUD-E, CASF-2016), achieving 77% binding site identification accuracy (within 8 Å), 55% ligand pose prediction accuracy (RMSD 2 Å), a 19% reduction in the mean distance to ground truth ligands within the binding site, and an 18.5% decrease in the mean pose RMSD. Statistical analysis across the combined benchmark set confirms the significance of these improvements ( ). Notably, the Weighted Hybrid Feature Selection variant in CoBDock-2 further increases binding site accuracy to 79.8%, demonstrating the benefit of combining multimodel and ensemble feature selection strategies. Variability in predictions also decreased significantly, highlighting enhanced reliability and generalizability. Also, a low-bias hypothetical comparison with a state-of-the-art DiffDock + NMDN method was conducted to position CoBDock-2 relative to modern deep learning-based docking strategies.
期刊介绍:
The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas:
- theoretical chemistry;
- computational chemistry;
- computer and molecular graphics;
- molecular modeling;
- protein engineering;
- drug design;
- expert systems;
- general structure-property relationships;
- molecular dynamics;
- chemical database development and usage.