Ashish Gupta, Amarinder S. Thind and Rituraj Purohit
{"title":"EGFRAP:一个预测机器学习模型,用于评估小分子对表皮生长因子受体的活性。","authors":"Ashish Gupta, Amarinder S. Thind and Rituraj Purohit","doi":"10.1039/D5MD00361J","DOIUrl":null,"url":null,"abstract":"<p >Epidermal growth factor receptor (EGFR) is a membrane-bound protein that interacts with epidermal growth factor, triggering receptor dimerization and tyrosine autophosphorylation, subsequently promoting cell proliferation. EGFR-associated pathways regulate cell housekeeping functions like growth, division, and apoptosis. However, the mutations/overexpression of EGFR cause unrestrained cell differentiation, leading to tumorigenesis. This study proposes a machine-learning-based tool, EGFR<small><sup>AP</sup></small>, to compute novel molecules' biological activities (pIC<small><sub>50</sub></small>) against EGFR. The tool is based on a robust quantitative structure–activity relationship (QSAR) model, trained on a large dataset of existing EGFR inhibitors using multiple machine learning algorithms. The extra trees regressor (ET) model showed promising results for the training dataset with an <em>R</em><small><sup>2</sup></small> value of 0.99, an RMSE value of 0.07 and an MAE of 0.009. The Pearson correlation between the observed and predicted pIC<small><sub>50</sub></small> values of the training set inhibitors was also very substantial, <em>i.e.</em> 0.99. The model was then validated using a test dataset, and the findings were satisfactory. An <em>R</em><small><sup>2</sup></small> value of 0.67, an RMSE of 0.89 and an MAE of 0.61 were detected for the test dataset, and the Pearson correlation coefficient of observed/predicted pIC<small><sub>50</sub></small> values was 0.82. The model was probed for overfitting using 10-fold cross-validation, and a series of structure-based drug design experiments were performed to validate the tool's predictions. The findings backed up the model's performance. This tool will be of significant importance to medicinal chemists in identifying promising EGFR inhibitors.</p>","PeriodicalId":21462,"journal":{"name":"RSC medicinal chemistry","volume":" 9","pages":" 4415-4426"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EGFRAP: a predictive machine learning model for assessing small molecule activity against the epidermal growth factor receptor\",\"authors\":\"Ashish Gupta, Amarinder S. Thind and Rituraj Purohit\",\"doi\":\"10.1039/D5MD00361J\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Epidermal growth factor receptor (EGFR) is a membrane-bound protein that interacts with epidermal growth factor, triggering receptor dimerization and tyrosine autophosphorylation, subsequently promoting cell proliferation. EGFR-associated pathways regulate cell housekeeping functions like growth, division, and apoptosis. However, the mutations/overexpression of EGFR cause unrestrained cell differentiation, leading to tumorigenesis. This study proposes a machine-learning-based tool, EGFR<small><sup>AP</sup></small>, to compute novel molecules' biological activities (pIC<small><sub>50</sub></small>) against EGFR. The tool is based on a robust quantitative structure–activity relationship (QSAR) model, trained on a large dataset of existing EGFR inhibitors using multiple machine learning algorithms. The extra trees regressor (ET) model showed promising results for the training dataset with an <em>R</em><small><sup>2</sup></small> value of 0.99, an RMSE value of 0.07 and an MAE of 0.009. The Pearson correlation between the observed and predicted pIC<small><sub>50</sub></small> values of the training set inhibitors was also very substantial, <em>i.e.</em> 0.99. The model was then validated using a test dataset, and the findings were satisfactory. An <em>R</em><small><sup>2</sup></small> value of 0.67, an RMSE of 0.89 and an MAE of 0.61 were detected for the test dataset, and the Pearson correlation coefficient of observed/predicted pIC<small><sub>50</sub></small> values was 0.82. The model was probed for overfitting using 10-fold cross-validation, and a series of structure-based drug design experiments were performed to validate the tool's predictions. The findings backed up the model's performance. This tool will be of significant importance to medicinal chemists in identifying promising EGFR inhibitors.</p>\",\"PeriodicalId\":21462,\"journal\":{\"name\":\"RSC medicinal chemistry\",\"volume\":\" 9\",\"pages\":\" 4415-4426\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RSC medicinal chemistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/md/d5md00361j\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RSC medicinal chemistry","FirstCategoryId":"3","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/md/d5md00361j","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
EGFRAP: a predictive machine learning model for assessing small molecule activity against the epidermal growth factor receptor
Epidermal growth factor receptor (EGFR) is a membrane-bound protein that interacts with epidermal growth factor, triggering receptor dimerization and tyrosine autophosphorylation, subsequently promoting cell proliferation. EGFR-associated pathways regulate cell housekeeping functions like growth, division, and apoptosis. However, the mutations/overexpression of EGFR cause unrestrained cell differentiation, leading to tumorigenesis. This study proposes a machine-learning-based tool, EGFRAP, to compute novel molecules' biological activities (pIC50) against EGFR. The tool is based on a robust quantitative structure–activity relationship (QSAR) model, trained on a large dataset of existing EGFR inhibitors using multiple machine learning algorithms. The extra trees regressor (ET) model showed promising results for the training dataset with an R2 value of 0.99, an RMSE value of 0.07 and an MAE of 0.009. The Pearson correlation between the observed and predicted pIC50 values of the training set inhibitors was also very substantial, i.e. 0.99. The model was then validated using a test dataset, and the findings were satisfactory. An R2 value of 0.67, an RMSE of 0.89 and an MAE of 0.61 were detected for the test dataset, and the Pearson correlation coefficient of observed/predicted pIC50 values was 0.82. The model was probed for overfitting using 10-fold cross-validation, and a series of structure-based drug design experiments were performed to validate the tool's predictions. The findings backed up the model's performance. This tool will be of significant importance to medicinal chemists in identifying promising EGFR inhibitors.