EGFRAP: a predictive machine learning model for assessing small molecule activity against the epidermal growth factor receptor

IF 3.6 4区 医学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Ashish Gupta, Amarinder S. Thind and Rituraj Purohit
{"title":"EGFRAP: a predictive machine learning model for assessing small molecule activity against the epidermal growth factor receptor","authors":"Ashish Gupta, Amarinder S. Thind and Rituraj Purohit","doi":"10.1039/D5MD00361J","DOIUrl":null,"url":null,"abstract":"<p >Epidermal growth factor receptor (EGFR) is a membrane-bound protein that interacts with epidermal growth factor, triggering receptor dimerization and tyrosine autophosphorylation, subsequently promoting cell proliferation. EGFR-associated pathways regulate cell housekeeping functions like growth, division, and apoptosis. However, the mutations/overexpression of EGFR cause unrestrained cell differentiation, leading to tumorigenesis. This study proposes a machine-learning-based tool, EGFR<small><sup>AP</sup></small>, to compute novel molecules' biological activities (pIC<small><sub>50</sub></small>) against EGFR. The tool is based on a robust quantitative structure–activity relationship (QSAR) model, trained on a large dataset of existing EGFR inhibitors using multiple machine learning algorithms. The extra trees regressor (ET) model showed promising results for the training dataset with an <em>R</em><small><sup>2</sup></small> value of 0.99, an RMSE value of 0.07 and an MAE of 0.009. The Pearson correlation between the observed and predicted pIC<small><sub>50</sub></small> values of the training set inhibitors was also very substantial, <em>i.e.</em> 0.99. The model was then validated using a test dataset, and the findings were satisfactory. An <em>R</em><small><sup>2</sup></small> value of 0.67, an RMSE of 0.89 and an MAE of 0.61 were detected for the test dataset, and the Pearson correlation coefficient of observed/predicted pIC<small><sub>50</sub></small> values was 0.82. The model was probed for overfitting using 10-fold cross-validation, and a series of structure-based drug design experiments were performed to validate the tool's predictions. The findings backed up the model's performance. This tool will be of significant importance to medicinal chemists in identifying promising EGFR inhibitors.</p>","PeriodicalId":21462,"journal":{"name":"RSC medicinal chemistry","volume":" 9","pages":" 4415-4426"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RSC medicinal chemistry","FirstCategoryId":"3","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/md/d5md00361j","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Epidermal growth factor receptor (EGFR) is a membrane-bound protein that interacts with epidermal growth factor, triggering receptor dimerization and tyrosine autophosphorylation, subsequently promoting cell proliferation. EGFR-associated pathways regulate cell housekeeping functions like growth, division, and apoptosis. However, the mutations/overexpression of EGFR cause unrestrained cell differentiation, leading to tumorigenesis. This study proposes a machine-learning-based tool, EGFRAP, to compute novel molecules' biological activities (pIC50) against EGFR. The tool is based on a robust quantitative structure–activity relationship (QSAR) model, trained on a large dataset of existing EGFR inhibitors using multiple machine learning algorithms. The extra trees regressor (ET) model showed promising results for the training dataset with an R2 value of 0.99, an RMSE value of 0.07 and an MAE of 0.009. The Pearson correlation between the observed and predicted pIC50 values of the training set inhibitors was also very substantial, i.e. 0.99. The model was then validated using a test dataset, and the findings were satisfactory. An R2 value of 0.67, an RMSE of 0.89 and an MAE of 0.61 were detected for the test dataset, and the Pearson correlation coefficient of observed/predicted pIC50 values was 0.82. The model was probed for overfitting using 10-fold cross-validation, and a series of structure-based drug design experiments were performed to validate the tool's predictions. The findings backed up the model's performance. This tool will be of significant importance to medicinal chemists in identifying promising EGFR inhibitors.

Abstract Image

EGFRAP:一个预测机器学习模型,用于评估小分子对表皮生长因子受体的活性。
表皮生长因子受体(Epidermal growth factor receptor, EGFR)是一种膜结合蛋白,与表皮生长因子相互作用,触发受体二聚化和酪氨酸自磷酸化,从而促进细胞增殖。egfr相关通路调节细胞的内务功能,如生长、分裂和凋亡。然而,EGFR的突变/过表达会导致不受限制的细胞分化,从而导致肿瘤的发生。本研究提出了一种基于机器学习的工具EGFRAP,用于计算新分子对EGFR的生物活性(pIC50)。该工具基于强大的定量结构-活性关系(QSAR)模型,使用多种机器学习算法在现有EGFR抑制剂的大型数据集上进行训练。额外树回归器(ET)模型对训练数据集显示出令人满意的结果,r2值为0.99,RMSE值为0.07,MAE为0.009。训练集抑制剂的pIC50观测值和预测值之间的Pearson相关性也非常可观,为0.99。然后使用测试数据集对模型进行验证,结果令人满意。检验数据集的r2值为0.67,RMSE为0.89,MAE为0.61,pIC50观测值/预测值的Pearson相关系数为0.82。使用10倍交叉验证对模型进行过拟合,并进行了一系列基于结构的药物设计实验来验证该工具的预测。研究结果支持了该模型的表现。这个工具将是非常重要的药物化学家在确定有前途的EGFR抑制剂。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.80
自引率
2.40%
发文量
129
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信