Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS
Alexandre de Fátima Cobre , Anderson Ara , Alexessander Couto Alves , Moisés Maia Neto , Mariana Millan Fachi , Laize Sílvia dos Anjos Botas Beca , Fernanda Stumpf Tonin , Roberto Pontarolo
{"title":"Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation","authors":"Alexandre de Fátima Cobre ,&nbsp;Anderson Ara ,&nbsp;Alexessander Couto Alves ,&nbsp;Moisés Maia Neto ,&nbsp;Mariana Millan Fachi ,&nbsp;Laize Sílvia dos Anjos Botas Beca ,&nbsp;Fernanda Stumpf Tonin ,&nbsp;Roberto Pontarolo","doi":"10.1016/j.chemolab.2024.105145","DOIUrl":null,"url":null,"abstract":"<div><p>Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease<strong>.</strong> This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict <em>anti</em>-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new <em>anti</em>-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: <span>https://github.com/AlexandreCOBRE/code</span><svg><path></path></svg>.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105145"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924000856","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease. This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict anti-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new anti-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: https://github.com/AlexandreCOBRE/code.

在 370 亿化合物数据库中识别 124 种新的抗艾滋病毒候选药物:机器学习(QSAR)、分子对接和分子动力学模拟的综合方法
世界卫生组织的最新数据显示,2023 年有 3 880 万人感染艾滋病毒。在这一人群中,有 150 万新增病例和 65 万死亡病例。这项研究采用了一种综合方法,包括基于QSAR的机器学习模型、分子对接和分子动力学模拟,以确定抑制CC趋化因子受体5型(CCR5)蛋白生物活性的潜在化合物,CCR5蛋白是HIV病毒的一个关键入口。利用来自 CHEMBL 数据库的非冗余实验数据,对 40 种不同的机器学习算法进行了训练,并利用前四种模型(XGBoost、基于直方图的梯度提升、光梯度提升机和额外树回归)预测 ZINC-22 数据库中 370 亿种化合物的抗 HIV 生物活性。通过分子对接和动力学模拟,筛选出了 124 种新的抗 HIV 候选药物。这项研究强调了这些化合物的治疗潜力,为进一步的体外和体内研究铺平了道路。机器学习与实验结果的融合为药物研究的重大进展提供了一条大有可为的途径,尤其是在治疗艾滋病毒等病毒性疾病方面。为了保证研究的可重复性,我们在 GitHub 上提供了 Python 代码(google colab)和相关数据库。您可以通过以下链接访问它们:GitHub 链接:https://github.com/AlexandreCOBRE/code.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信