建立新冠肺炎药物再利用的SARS-CoV-2主要蛋白酶结合预测随机森林模型。

IF 2.8 4区医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL

Experimental Biology and Medicine Pub Date : 2023-11-01 Epub Date: 2023-11-24 DOI:10.1177/15353702231209413

Jie Liu, Liang Xu, Wenjing Guo, Zoe Li, Md Kamrul Hasan Khan, Weigong Ge, Tucker A Patterson, Huixiao Hong

{"title":"建立新冠肺炎药物再利用的SARS-CoV-2主要蛋白酶结合预测随机森林模型。","authors":"Jie Liu, Liang Xu, Wenjing Guo, Zoe Li, Md Kamrul Hasan Khan, Weigong Ge, Tucker A Patterson, Huixiao Hong","doi":"10.1177/15353702231209413","DOIUrl":null,"url":null,"abstract":"The coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective COVID-19 treatment drugs that target the main protease of SARS-CoV-2, a major target for COVID-19 drugs. In this study, machine learning was applied for predicting the SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for COVID-19 treatment. Ligands bound to the SARS-CoV-2 main protease in the Protein Data Bank and compounds experimentally tested in SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify SARS-CoV-2 main protease binders as potential candidates for repurposing to treat COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat COVID-19. Our results demonstrate that machine learning is an efficient method for drug repurposing and, thus, may accelerate drug development targeting SARS-CoV-2.","PeriodicalId":12163,"journal":{"name":"Experimental Biology and Medicine","volume":" ","pages":"1927-1936"},"PeriodicalIF":2.8000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798185/pdf/","citationCount":"0","resultStr":"{\"title\":\"Developing a SARS-CoV-2 main protease binding prediction random forest model for drug repurposing for COVID-19 treatment.\",\"authors\":\"Jie Liu, Liang Xu, Wenjing Guo, Zoe Li, Md Kamrul Hasan Khan, Weigong Ge, Tucker A Patterson, Huixiao Hong\",\"doi\":\"10.1177/15353702231209413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective COVID-19 treatment drugs that target the main protease of SARS-CoV-2, a major target for COVID-19 drugs. In this study, machine learning was applied for predicting the SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for COVID-19 treatment. Ligands bound to the SARS-CoV-2 main protease in the Protein Data Bank and compounds experimentally tested in SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify SARS-CoV-2 main protease binders as potential candidates for repurposing to treat COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat COVID-19. Our results demonstrate that machine learning is an efficient method for drug repurposing and, thus, may accelerate drug development targeting SARS-CoV-2.\",\"PeriodicalId\":12163,\"journal\":{\"name\":\"Experimental Biology and Medicine\",\"volume\":\" \",\"pages\":\"1927-1936\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798185/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Experimental Biology and Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/15353702231209413\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/11/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental Biology and Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15353702231209413","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

2019年冠状病毒病(COVID-19)全球大流行导致数百万人感染了严重急性呼吸综合征冠状病毒2 (SARS-CoV-2)病毒，全球近700万人死亡。因此，有必要进一步探索和设计针对SARS-CoV-2主要蛋白酶的有效COVID-19治疗药物，这是COVID-19药物的主要靶点。在这项研究中，机器学习被应用于预测美国食品和药物管理局(FDA)批准的药物与SARS-CoV-2主要蛋白酶的结合，以帮助确定潜在的可用于治疗COVID-19的候选药物。筛选蛋白质数据库中与SARS-CoV-2主要蛋白酶结合的配体，以及文献中SARS-CoV-2主要蛋白酶结合试验中实验检测的化合物。这些化学品被分为训练(516种化学品)和测试(360种化学品)数据集。为了确定SARS-CoV-2主要蛋白酶结合物作为重新用于治疗COVID-19的潜在候选物，从肝毒性知识库中获得了1188种fda批准的药物。基于Mold2软件计算的分子描述符，采用随机森林算法构建预测模型。使用100次五重交叉验证来评估模型性能，结果达到78.8%的平衡精度。利用整个训练数据集构建的随机森林模型预测了SARS-CoV-2主要蛋白酶在测试集和fda批准的药物上的结合。预测作为主要蛋白酶结合物的药物的模型适用性域和预测置信度发现了10种fda批准的药物作为治疗COVID-19的潜在候选药物。我们的研究结果表明，机器学习是一种有效的药物再利用方法，因此可能会加速针对SARS-CoV-2的药物开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Developing a SARS-CoV-2 main protease binding prediction random forest model for drug repurposing for COVID-19 treatment.

The coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective COVID-19 treatment drugs that target the main protease of SARS-CoV-2, a major target for COVID-19 drugs. In this study, machine learning was applied for predicting the SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for COVID-19 treatment. Ligands bound to the SARS-CoV-2 main protease in the Protein Data Bank and compounds experimentally tested in SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify SARS-CoV-2 main protease binders as potential candidates for repurposing to treat COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat COVID-19. Our results demonstrate that machine learning is an efficient method for drug repurposing and, thus, may accelerate drug development targeting SARS-CoV-2.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Experimental Biology and Medicine 医学-医学：研究与实验

CiteScore

6.00

自引率

0.00%

发文量

157

审稿时长

1 months

期刊介绍： Experimental Biology and Medicine (EBM) is a global, peer-reviewed journal dedicated to the publication of multidisciplinary and interdisciplinary research in the biomedical sciences. EBM provides both research and review articles as well as meeting symposia and brief communications. Articles in EBM represent cutting edge research at the overlapping junctions of the biological, physical and engineering sciences that impact upon the health and welfare of the world''s population. Topics covered in EBM include: Anatomy/Pathology; Biochemistry and Molecular Biology; Bioimaging; Biomedical Engineering; Bionanoscience; Cell and Developmental Biology; Endocrinology and Nutrition; Environmental Health/Biomarkers/Precision Medicine; Genomics, Proteomics, and Bioinformatics; Immunology/Microbiology/Virology; Mechanisms of Aging; Neuroscience; Pharmacology and Toxicology; Physiology; Stem Cell Biology; Structural Biology; Systems Biology and Microphysiological Systems; and Translational Research.