{"title":"融合XGboost机器学习和分子对接策略识别头角炎引诱剂:体外/体内测试天然配体的分子表征和数据库管理。","authors":"E.B. Alencar Filho, R.P. Guimarães, V.C. Santos, A.B.P. Bispo, B.A.G. Paranhos, N.C. Aquino, R. Nascimento, R.F. Oliveira Neto","doi":"10.1002/arch.70095","DOIUrl":null,"url":null,"abstract":"<p>The Mediterranean fruit fly <i>Ceratitis capitata</i> (Wiedemann) (Diptera: Tephritidae) is one of the most critical agricultural pests, causing economic damage globally due to its wide range of fruit hosts. Conventional insecticides have brought environmental, human health, and resistance challenges, driving interest in semiochemicals as sustainable pest management alternatives. Potential molecular attractants can be assessed experimentally through methods such as electroantennography (EAG) or behavioral assays. Odorant Binding Proteins (OBPs) have been recognized as crucial mediators in detecting these chemical signals. Although isolated compounds can provide mechanistic insights, volatile blends more accurately reflect natural conditions and typically elicit stronger behavioral responses. However, designing effective blends is challenging due to their complexity and regulatory limitations. Therefore, curated molecular databases of potential attractants become essential to accelerate the discovery and reduce cost in research programs, both in vitro and in vivo tests. The in silico molecular approaches, including Molecular Docking, Molecular Dynamics (MD) and Quantitative Structure–Activity Relationships (QSAR), offer cost-effective methods to prioritize candidates and/or understand ligand-OBP interactions. In this study, computational methodologies including Machine Learning (ML) based QSAR, molecular docking and MD simulations were integrated to highlight molecular features of standard molecules and identify potential attractors for <i>C. capitata</i>, which are expected to be good OBP binders. Initially, was applied a Bee Colony Algorithm, combined with an final XGBoost Machine Learning model, enabled the identification of five essential molecular descriptors to explain the attractant effect of 20 standard compounds recognized in the literature. Applying this model to an online database of natural products from Brazil (NuBBE—Nuclei of Bioassays, Ecophysiology and Biosynthesis of Natural Products Database), 206 molecules were identified from over 2000 candidates. In a parallel front of investigation, docking-based virtual screening was performed using the same NuBBE database. Most promissory compounds were discussed based on binding energy, structure/geometry focusing on interactions and estimated volatility, through the evaluation of vapor pressure. MD simulations with the gold standard compound (E,E)-α-farnesene provided insights into ligand-protein interactions. Interestingly, 16 of the top 20 ranked compounds after dockings were predicted as attractors by the XGBoost model. Finally, the curated database of 206 compounds, the great contribution of this paper (beyond the model), can be used to assertively select molecules for experimental tests of future blends or isolated compounds.</p>","PeriodicalId":8281,"journal":{"name":"Archives of Insect Biochemistry and Physiology","volume":"120 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444618/pdf/","citationCount":"0","resultStr":"{\"title\":\"Converging XGboost Machine Learning and Molecular Docking Strategies to Identify Attractants for Ceratitis capitata: Molecular Characterization and Database Curation of Natural Ligands for In Vitro/In Vivo Tests\",\"authors\":\"E.B. Alencar Filho, R.P. Guimarães, V.C. Santos, A.B.P. Bispo, B.A.G. Paranhos, N.C. Aquino, R. Nascimento, R.F. Oliveira Neto\",\"doi\":\"10.1002/arch.70095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Mediterranean fruit fly <i>Ceratitis capitata</i> (Wiedemann) (Diptera: Tephritidae) is one of the most critical agricultural pests, causing economic damage globally due to its wide range of fruit hosts. Conventional insecticides have brought environmental, human health, and resistance challenges, driving interest in semiochemicals as sustainable pest management alternatives. Potential molecular attractants can be assessed experimentally through methods such as electroantennography (EAG) or behavioral assays. Odorant Binding Proteins (OBPs) have been recognized as crucial mediators in detecting these chemical signals. Although isolated compounds can provide mechanistic insights, volatile blends more accurately reflect natural conditions and typically elicit stronger behavioral responses. However, designing effective blends is challenging due to their complexity and regulatory limitations. Therefore, curated molecular databases of potential attractants become essential to accelerate the discovery and reduce cost in research programs, both in vitro and in vivo tests. The in silico molecular approaches, including Molecular Docking, Molecular Dynamics (MD) and Quantitative Structure–Activity Relationships (QSAR), offer cost-effective methods to prioritize candidates and/or understand ligand-OBP interactions. In this study, computational methodologies including Machine Learning (ML) based QSAR, molecular docking and MD simulations were integrated to highlight molecular features of standard molecules and identify potential attractors for <i>C. capitata</i>, which are expected to be good OBP binders. Initially, was applied a Bee Colony Algorithm, combined with an final XGBoost Machine Learning model, enabled the identification of five essential molecular descriptors to explain the attractant effect of 20 standard compounds recognized in the literature. Applying this model to an online database of natural products from Brazil (NuBBE—Nuclei of Bioassays, Ecophysiology and Biosynthesis of Natural Products Database), 206 molecules were identified from over 2000 candidates. In a parallel front of investigation, docking-based virtual screening was performed using the same NuBBE database. Most promissory compounds were discussed based on binding energy, structure/geometry focusing on interactions and estimated volatility, through the evaluation of vapor pressure. MD simulations with the gold standard compound (E,E)-α-farnesene provided insights into ligand-protein interactions. Interestingly, 16 of the top 20 ranked compounds after dockings were predicted as attractors by the XGBoost model. Finally, the curated database of 206 compounds, the great contribution of this paper (beyond the model), can be used to assertively select molecules for experimental tests of future blends or isolated compounds.</p>\",\"PeriodicalId\":8281,\"journal\":{\"name\":\"Archives of Insect Biochemistry and Physiology\",\"volume\":\"120 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444618/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Archives of Insect Biochemistry and Physiology\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/arch.70095\",\"RegionNum\":4,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Insect Biochemistry and Physiology","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/arch.70095","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
地中海果蝇头角蝇(双翅目:丝蛾科)是最重要的农业害虫之一,由于其广泛的水果宿主,在全球范围内造成经济损失。传统杀虫剂带来了环境、人类健康和抗药性方面的挑战,促使人们对作为可持续虫害管理替代品的semiochemicals产生了兴趣。潜在的分子引诱剂可以通过实验方法进行评估,如天线电图(EAG)或行为分析。气味结合蛋白(OBPs)被认为是检测这些化学信号的重要介质。虽然分离的化合物可以提供机理见解,但挥发性混合物更准确地反映了自然条件,通常会引起更强烈的行为反应。然而,由于其复杂性和监管限制,设计有效的混合物具有挑战性。因此,精心策划的潜在引诱剂分子数据库对于加速发现和降低研究项目的成本至关重要,无论是在体外还是在体内测试。硅分子方法,包括分子对接、分子动力学(MD)和定量构效关系(QSAR),提供了具有成本效益的方法来确定候选物的优先级和/或了解配体与obp的相互作用。本研究结合基于机器学习(ML)的QSAR、分子对接和MD模拟等计算方法,突出了标准分子的分子特征,并确定了C. capitata的潜在吸引子,这些吸引子有望成为良好的OBP结合剂。首先,应用蜂群算法,结合最终的XGBoost机器学习模型,能够识别出五个基本的分子描述符,以解释文献中识别的20种标准化合物的引诱效应。将该模型应用于巴西天然产物的在线数据库(nubb - nucleus of Bioassays, ecophyology and Biosynthesis of natural products database),从2000多个候选分子中鉴定出206个分子。在调查的并行前端,使用相同的NuBBE数据库执行基于对接的虚拟筛选。大多数期许化合物都是基于结合能、结构/几何、相互作用和估计挥发性来讨论的,通过蒸汽压的评估。金标准化合物(E,E)-α-法尼烯的MD模拟提供了对配体-蛋白质相互作用的见解。有趣的是,在对接后排名前20位的化合物中,有16种被XGBoost模型预测为吸引子。最后,本文最大的贡献(超越了模型)是建立了包含206种化合物的数据库,它可以用来为未来的混合物或分离化合物的实验测试自信地选择分子。
Converging XGboost Machine Learning and Molecular Docking Strategies to Identify Attractants for Ceratitis capitata: Molecular Characterization and Database Curation of Natural Ligands for In Vitro/In Vivo Tests
The Mediterranean fruit fly Ceratitis capitata (Wiedemann) (Diptera: Tephritidae) is one of the most critical agricultural pests, causing economic damage globally due to its wide range of fruit hosts. Conventional insecticides have brought environmental, human health, and resistance challenges, driving interest in semiochemicals as sustainable pest management alternatives. Potential molecular attractants can be assessed experimentally through methods such as electroantennography (EAG) or behavioral assays. Odorant Binding Proteins (OBPs) have been recognized as crucial mediators in detecting these chemical signals. Although isolated compounds can provide mechanistic insights, volatile blends more accurately reflect natural conditions and typically elicit stronger behavioral responses. However, designing effective blends is challenging due to their complexity and regulatory limitations. Therefore, curated molecular databases of potential attractants become essential to accelerate the discovery and reduce cost in research programs, both in vitro and in vivo tests. The in silico molecular approaches, including Molecular Docking, Molecular Dynamics (MD) and Quantitative Structure–Activity Relationships (QSAR), offer cost-effective methods to prioritize candidates and/or understand ligand-OBP interactions. In this study, computational methodologies including Machine Learning (ML) based QSAR, molecular docking and MD simulations were integrated to highlight molecular features of standard molecules and identify potential attractors for C. capitata, which are expected to be good OBP binders. Initially, was applied a Bee Colony Algorithm, combined with an final XGBoost Machine Learning model, enabled the identification of five essential molecular descriptors to explain the attractant effect of 20 standard compounds recognized in the literature. Applying this model to an online database of natural products from Brazil (NuBBE—Nuclei of Bioassays, Ecophysiology and Biosynthesis of Natural Products Database), 206 molecules were identified from over 2000 candidates. In a parallel front of investigation, docking-based virtual screening was performed using the same NuBBE database. Most promissory compounds were discussed based on binding energy, structure/geometry focusing on interactions and estimated volatility, through the evaluation of vapor pressure. MD simulations with the gold standard compound (E,E)-α-farnesene provided insights into ligand-protein interactions. Interestingly, 16 of the top 20 ranked compounds after dockings were predicted as attractors by the XGBoost model. Finally, the curated database of 206 compounds, the great contribution of this paper (beyond the model), can be used to assertively select molecules for experimental tests of future blends or isolated compounds.
期刊介绍:
Archives of Insect Biochemistry and Physiology is an international journal that publishes articles in English that are of interest to insect biochemists and physiologists. Generally these articles will be in, or related to, one of the following subject areas: Behavior, Bioinformatics, Carbohydrates, Cell Line Development, Cell Signalling, Development, Drug Discovery, Endocrinology, Enzymes, Lipids, Molecular Biology, Neurobiology, Nucleic Acids, Nutrition, Peptides, Pharmacology, Pollinators, Proteins, Toxicology. Archives will publish only original articles. Articles that are confirmatory in nature or deal with analytical methods previously described will not be accepted.