{"title":"Classification of ULK1 inhibitors and SAR analysis by machine learning methods.","authors":"X Wang, H Yin, A Yan","doi":"10.1080/1062936X.2025.2521295","DOIUrl":null,"url":null,"abstract":"<p><p>Unc-51 like kinase 1 (ULK1), a key regulator of autophagy initiation, is a novel target for anticancer drug design. In this work, we collected 846 ULK1 inhibitors with IC<sub>50</sub> values from 30 references. Based on ECFP_4, MACCS fingerprints, and Mordred descriptors, we established a list of classification models by using Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost) and Deep Neural Networks (DNN). Additionally, several Fingerprint and Graph Neural Network (FP-GNN) models were also constructed using mixed molecular fingerprints and molecular graph. A total of 39 classification models were developed. Model_1D_1, an ECFP4-based DNN model, performed the best, achieving accuracies over 95% and Matthews correlation coefficient (MCC) over 0.9 on both validation and test sets. The applicability domain calculated by weighted Euclidean distance indicated that Model_1D_1 could reliably predict the activity for over 84% compounds in both training and test sets. We conducted structure-activity relationship (SAR) analysis through K-means and SHAP. The dataset's molecular structures were classified into 7 subsets by K-means clustering. We identified three high-activity subsets sharing a common scaffold, 2-amino-4-(2-thienyl)-5-(trifluoromethyl)pyrimidine. SHAP analysis highlighted critical molecular fragments influencing activity, enhancing our understanding of model predictions and providing a theoretical basis for optimizing ULK1 inhibitors.</p>","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":" ","pages":"463-485"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2025.2521295","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Unc-51 like kinase 1 (ULK1), a key regulator of autophagy initiation, is a novel target for anticancer drug design. In this work, we collected 846 ULK1 inhibitors with IC50 values from 30 references. Based on ECFP_4, MACCS fingerprints, and Mordred descriptors, we established a list of classification models by using Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost) and Deep Neural Networks (DNN). Additionally, several Fingerprint and Graph Neural Network (FP-GNN) models were also constructed using mixed molecular fingerprints and molecular graph. A total of 39 classification models were developed. Model_1D_1, an ECFP4-based DNN model, performed the best, achieving accuracies over 95% and Matthews correlation coefficient (MCC) over 0.9 on both validation and test sets. The applicability domain calculated by weighted Euclidean distance indicated that Model_1D_1 could reliably predict the activity for over 84% compounds in both training and test sets. We conducted structure-activity relationship (SAR) analysis through K-means and SHAP. The dataset's molecular structures were classified into 7 subsets by K-means clustering. We identified three high-activity subsets sharing a common scaffold, 2-amino-4-(2-thienyl)-5-(trifluoromethyl)pyrimidine. SHAP analysis highlighted critical molecular fragments influencing activity, enhancing our understanding of model predictions and providing a theoretical basis for optimizing ULK1 inhibitors.
期刊介绍:
SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.