Classification of ULK1 inhibitors and SAR analysis by machine learning methods.

IF 2.3 3区 环境科学与生态学 Q3 CHEMISTRY, MULTIDISCIPLINARY
X Wang, H Yin, A Yan
{"title":"Classification of ULK1 inhibitors and SAR analysis by machine learning methods.","authors":"X Wang, H Yin, A Yan","doi":"10.1080/1062936X.2025.2521295","DOIUrl":null,"url":null,"abstract":"<p><p>Unc-51 like kinase 1 (ULK1), a key regulator of autophagy initiation, is a novel target for anticancer drug design. In this work, we collected 846 ULK1 inhibitors with IC<sub>50</sub> values from 30 references. Based on ECFP_4, MACCS fingerprints, and Mordred descriptors, we established a list of classification models by using Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost) and Deep Neural Networks (DNN). Additionally, several Fingerprint and Graph Neural Network (FP-GNN) models were also constructed using mixed molecular fingerprints and molecular graph. A total of 39 classification models were developed. Model_1D_1, an ECFP4-based DNN model, performed the best, achieving accuracies over 95% and Matthews correlation coefficient (MCC) over 0.9 on both validation and test sets. The applicability domain calculated by weighted Euclidean distance indicated that Model_1D_1 could reliably predict the activity for over 84% compounds in both training and test sets. We conducted structure-activity relationship (SAR) analysis through K-means and SHAP. The dataset's molecular structures were classified into 7 subsets by K-means clustering. We identified three high-activity subsets sharing a common scaffold, 2-amino-4-(2-thienyl)-5-(trifluoromethyl)pyrimidine. SHAP analysis highlighted critical molecular fragments influencing activity, enhancing our understanding of model predictions and providing a theoretical basis for optimizing ULK1 inhibitors.</p>","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":" ","pages":"463-485"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2025.2521295","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Unc-51 like kinase 1 (ULK1), a key regulator of autophagy initiation, is a novel target for anticancer drug design. In this work, we collected 846 ULK1 inhibitors with IC50 values from 30 references. Based on ECFP_4, MACCS fingerprints, and Mordred descriptors, we established a list of classification models by using Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost) and Deep Neural Networks (DNN). Additionally, several Fingerprint and Graph Neural Network (FP-GNN) models were also constructed using mixed molecular fingerprints and molecular graph. A total of 39 classification models were developed. Model_1D_1, an ECFP4-based DNN model, performed the best, achieving accuracies over 95% and Matthews correlation coefficient (MCC) over 0.9 on both validation and test sets. The applicability domain calculated by weighted Euclidean distance indicated that Model_1D_1 could reliably predict the activity for over 84% compounds in both training and test sets. We conducted structure-activity relationship (SAR) analysis through K-means and SHAP. The dataset's molecular structures were classified into 7 subsets by K-means clustering. We identified three high-activity subsets sharing a common scaffold, 2-amino-4-(2-thienyl)-5-(trifluoromethyl)pyrimidine. SHAP analysis highlighted critical molecular fragments influencing activity, enhancing our understanding of model predictions and providing a theoretical basis for optimizing ULK1 inhibitors.

ULK1抑制剂的分类和机器学习方法的SAR分析。
Unc-51样激酶1 (ULK1)是自噬起始的关键调控因子,是抗癌药物设计的新靶点。在这项工作中,我们从30篇文献中收集了846个IC50值的ULK1抑制剂。基于ECFP_4、MACCS指纹图谱和Mordred描述符,采用支持向量机(SVM)、随机森林(RF)、极限梯度增强(XGBoost)和深度神经网络(DNN)建立了分类模型列表。此外,利用分子指纹和分子图的混合,构建了指纹和图神经网络(FP-GNN)模型。共建立了39个分类模型。基于ecfp4的DNN模型Model_1D_1表现最好,在验证集和测试集上的准确率均超过95%,Matthews相关系数(MCC)均超过0.9。加权欧几里得距离计算的适用性范围表明,Model_1D_1在训练集和测试集上都能可靠地预测超过84%的化合物的活性。我们通过K-means和SHAP进行构效关系(SAR)分析。通过K-means聚类将数据集的分子结构划分为7个子集。我们确定了三个高活性亚群共享一个共同的支架,2-氨基-4-(2-噻吩基)-5-(三氟甲基)嘧啶。SHAP分析突出了影响活性的关键分子片段,增强了我们对模型预测的理解,并为优化ULK1抑制剂提供了理论基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
20.00%
发文量
78
审稿时长
>24 weeks
期刊介绍: SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信