利用机器学习方法加速分子对接。

IF 3.1 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics Pub Date : 2024-06-01 Epub Date: 2024-06-08 DOI:10.1002/minf.202300167

Abdulsalam Y Bande, Sefer Baday

{"title":"利用机器学习方法加速分子对接。","authors":"Abdulsalam Y Bande, Sefer Baday","doi":"10.1002/minf.202300167","DOIUrl":null,"url":null,"abstract":"Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300167"},"PeriodicalIF":3.1000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating Molecular Docking using Machine Learning Methods.\",\"authors\":\"Abdulsalam Y Bande, Sefer Baday\",\"doi\":\"10.1002/minf.202300167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.\",\"PeriodicalId\":18853,\"journal\":{\"name\":\"Molecular Informatics\",\"volume\":\" \",\"pages\":\"e202300167\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/minf.202300167\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202300167","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

虚拟筛选（VS）是药物发现中一种行之有效的方法，它能加快寻找生物活性分子的速度，降低实验成本和工作量。VS 有助于缩小化学空间的搜索范围，从而选择更少、更可能的候选化合物进行实验测试。Docking 计算是常用的、备受赞赏的基于结构的药物发现方法之一。小分子化学结构数据库一直在快速增长。然而，目前通过对接对大型文库进行虚拟筛选的方法并不常见。在这项工作中，我们旨在通过预测对接得分来加速对接研究，而无需明确执行对接计算。我们试验了一种基于注意力的长短期记忆（LSTM）神经网络，用于高效预测对接得分，以及其他机器学习模型，如 XGBoost。通过使用少量配体的对接得分，我们训练了模型，并预测了几百万个分子的对接得分。具体来说，我们在内部药物发现研究产生的 11 个数据集上测试了我们的方法。平均而言，通过仅使用 7000 个分子训练模型，我们预测了约 380 万个分子的对接得分，R2（决定系数）为 0.77，斯皮尔曼等级相关系数为 0.85。我们在设计该系统时考虑到了易用性。用户只需提供一个包含 SMILES 及其各自对接得分的 csv 文件，系统就会输出一个模型，用户可以用它来预测新分子的对接得分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Molecular Docking using Machine Learning Methods.

Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R² (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.30

自引率

2.80%

发文量

审稿时长

3 months

期刊介绍： Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.