Alexandre de Fátima Cobre , Anderson Ara , Alexessander Couto Alves , Moisés Maia Neto , Mariana Millan Fachi , Laize Sílvia dos Anjos Botas Beca , Fernanda Stumpf Tonin , Roberto Pontarolo
{"title":"在 370 亿化合物数据库中识别 124 种新的抗艾滋病毒候选药物:机器学习(QSAR)、分子对接和分子动力学模拟的综合方法","authors":"Alexandre de Fátima Cobre , Anderson Ara , Alexessander Couto Alves , Moisés Maia Neto , Mariana Millan Fachi , Laize Sílvia dos Anjos Botas Beca , Fernanda Stumpf Tonin , Roberto Pontarolo","doi":"10.1016/j.chemolab.2024.105145","DOIUrl":null,"url":null,"abstract":"<div><p>Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease<strong>.</strong> This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict <em>anti</em>-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new <em>anti</em>-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: <span>https://github.com/AlexandreCOBRE/code</span><svg><path></path></svg>.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105145"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation\",\"authors\":\"Alexandre de Fátima Cobre , Anderson Ara , Alexessander Couto Alves , Moisés Maia Neto , Mariana Millan Fachi , Laize Sílvia dos Anjos Botas Beca , Fernanda Stumpf Tonin , Roberto Pontarolo\",\"doi\":\"10.1016/j.chemolab.2024.105145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease<strong>.</strong> This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict <em>anti</em>-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new <em>anti</em>-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: <span>https://github.com/AlexandreCOBRE/code</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"250 \",\"pages\":\"Article 105145\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743924000856\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924000856","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation
Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease. This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict anti-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new anti-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: https://github.com/AlexandreCOBRE/code.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.