An ensemble machine learning model generates a focused screening library for the identification of CDK8 inhibitors.

IF 4.5 3区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Protein Science Pub Date : 2024-06-01 DOI:10.1002/pro.5007

Tony Eight Lin, Dyan Yen, Wei-Chun HuangFu, Yi-Wen Wu, Jui-Yi Hsu, Shih-Chung Yen, Tzu-Ying Sung, Jui-Hua Hsieh, Shiow-Lin Pan, Chia-Ron Yang, Wei-Jan Huang, Kai-Cheng Hsu

{"title":"An ensemble machine learning model generates a focused screening library for the identification of CDK8 inhibitors.","authors":"Tony Eight Lin, Dyan Yen, Wei-Chun HuangFu, Yi-Wen Wu, Jui-Yi Hsu, Shih-Chung Yen, Tzu-Ying Sung, Jui-Hua Hsieh, Shiow-Lin Pan, Chia-Ron Yang, Wei-Jan Huang, Kai-Cheng Hsu","doi":"10.1002/pro.5007","DOIUrl":null,"url":null,"abstract":"The identification of an effective inhibitor is an important starting step in drug development. Unfortunately, many issues such as the characterization of protein binding sites, the screening library, materials for assays, etc., make drug screening a difficult proposition. As the size of screening libraries increases, more resources will be inefficiently consumed. Thus, new strategies are needed to preprocess and focus a screening library towards a targeted protein. Herein, we report an ensemble machine learning (ML) model to generate a CDK8-focused screening library. The ensemble model consists of six different algorithms optimized for CDK8 inhibitor classification. The models were trained using a CDK8-specific fragment library along with molecules containing CDK8 activity. The optimized ensemble model processed a commercial library containing 1.6 million molecules. This resulted in a CDK8-focused screening library containing 1,672 molecules, a reduction of more than 99.90%. The CDK8-focused library was then subjected to molecular docking, and 25 candidate compounds were selected. Enzymatic assays confirmed six CDK8 inhibitors, with one compound producing an IC50 value of ≤100 nM. Analysis of the ensemble ML model reveals the role of the CDK8 fragment library during training. Structural analysis of molecules reveals the hit compounds to be structurally novel CDK8 inhibitors. Together, the results highlight a pipeline for curating a focused library for a specific protein target, such as CDK8.","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"33 6","pages":"e5007"},"PeriodicalIF":4.5000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11081523/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.5007","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The identification of an effective inhibitor is an important starting step in drug development. Unfortunately, many issues such as the characterization of protein binding sites, the screening library, materials for assays, etc., make drug screening a difficult proposition. As the size of screening libraries increases, more resources will be inefficiently consumed. Thus, new strategies are needed to preprocess and focus a screening library towards a targeted protein. Herein, we report an ensemble machine learning (ML) model to generate a CDK8-focused screening library. The ensemble model consists of six different algorithms optimized for CDK8 inhibitor classification. The models were trained using a CDK8-specific fragment library along with molecules containing CDK8 activity. The optimized ensemble model processed a commercial library containing 1.6 million molecules. This resulted in a CDK8-focused screening library containing 1,672 molecules, a reduction of more than 99.90%. The CDK8-focused library was then subjected to molecular docking, and 25 candidate compounds were selected. Enzymatic assays confirmed six CDK8 inhibitors, with one compound producing an IC₅₀ value of ≤100 nM. Analysis of the ensemble ML model reveals the role of the CDK8 fragment library during training. Structural analysis of molecules reveals the hit compounds to be structurally novel CDK8 inhibitors. Together, the results highlight a pipeline for curating a focused library for a specific protein target, such as CDK8.

查看原文本刊更多论文

一个集合机器学习模型生成了一个重点筛选库，用于鉴定 CDK8 抑制剂。

确定有效的抑制剂是药物研发的重要起步。遗憾的是，蛋白质结合位点的表征、筛选库、检测材料等诸多问题使得药物筛选变得困难重重。随着筛选库规模的扩大，更多的资源将被无效消耗。因此，我们需要新的策略来预处理筛选库并将其聚焦于目标蛋白质。在此，我们报告了一种生成 CDK8 聚焦筛选库的集合机器学习（ML）模型。该集合模型由六种不同的算法组成，针对 CDK8 抑制剂分类进行了优化。这些模型使用 CDK8 特异性片段库和含有 CDK8 活性的分子进行训练。优化后的集合模型处理了包含 160 万个分子的商业库。这样，一个以 CDK8 为重点的筛选库就包含了 1672 个分子，减少了 99.90% 以上。以 CDK8 为重点的筛选库随后进行了分子对接，选出了 25 个候选化合物。酶学测定证实了六种 CDK8 抑制剂，其中一种化合物的 IC50 值≤100 nM。对集合 ML 模型的分析揭示了 CDK8 片段库在训练过程中的作用。分子结构分析表明，命中化合物是结构新颖的 CDK8 抑制剂。总之，这些结果突出了针对特定蛋白质靶点（如 CDK8）策划重点库的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Protein Science 生物-生化与分子生物学

CiteScore

12.40

自引率

1.20%

发文量

246

审稿时长

1 months

期刊介绍： Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution. Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics. The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication. Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).