AI-driven prediction of drug activity against Toxoplasma gondii: Data augmentation and deep neural networks for limited datasets

Natalia V. Karimova , Ravithree D. Senanayake
{"title":"AI-driven prediction of drug activity against Toxoplasma gondii: Data augmentation and deep neural networks for limited datasets","authors":"Natalia V. Karimova ,&nbsp;Ravithree D. Senanayake","doi":"10.1016/j.aichem.2025.100084","DOIUrl":null,"url":null,"abstract":"<div><div>Toxoplasmosis, caused by <em>Toxoplasma gondii</em> (<em>T. gondii</em>), is a serious global health concern, particularly in immunocompromised individuals. Inhibiting the enzyme TgDHFR is a promising strategy for developing treatments. This Artificial Intelligence (AI)-driven Quantitative Structure-Activity Relationship (QSAR) study applies deep neural networks (DNNs) to predict pIC<sub>50</sub> values for potential inhibitors, using 2D and 3D molecular descriptors and fingerprints. To address training data limitations, we introduced a novel methodology combining targeted descriptor selection, Gaussian noise-based data augmentation, and an ensemble of DNNs. This approach significantly enhanced model performance, increasing the R² from 0.75 with the original dataset to 0.85. The model was further validated using two FDA-approved drugs for <em>T. gondii</em> treatment—pyrimethamine and trimethoprim—yielding relative errors of 3.35 % and 2.15 % in pIC<sub>50</sub> predictions compared to experimental values. Finally, the model was applied to screen FDA-approved drugs after filtering out molecules that did not align with the characteristics of the training dataset. The predicted pIC<sub>50</sub> values were further used to calculate ligand efficiency (LE), binding efficiency index (BEI), lipophilic ligand efficiency (LLE), and surface efficiency index (SEI), identifying the most promising TgDHFR inhibitors for further investigation. By leveraging AI and data augmentation approach, this study provides a powerful tool for pIC<sub>50</sub> predictions of TgDHFR inhibitors, which can be adapted to other systems.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100084"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747725000016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Toxoplasmosis, caused by Toxoplasma gondii (T. gondii), is a serious global health concern, particularly in immunocompromised individuals. Inhibiting the enzyme TgDHFR is a promising strategy for developing treatments. This Artificial Intelligence (AI)-driven Quantitative Structure-Activity Relationship (QSAR) study applies deep neural networks (DNNs) to predict pIC50 values for potential inhibitors, using 2D and 3D molecular descriptors and fingerprints. To address training data limitations, we introduced a novel methodology combining targeted descriptor selection, Gaussian noise-based data augmentation, and an ensemble of DNNs. This approach significantly enhanced model performance, increasing the R² from 0.75 with the original dataset to 0.85. The model was further validated using two FDA-approved drugs for T. gondii treatment—pyrimethamine and trimethoprim—yielding relative errors of 3.35 % and 2.15 % in pIC50 predictions compared to experimental values. Finally, the model was applied to screen FDA-approved drugs after filtering out molecules that did not align with the characteristics of the training dataset. The predicted pIC50 values were further used to calculate ligand efficiency (LE), binding efficiency index (BEI), lipophilic ligand efficiency (LLE), and surface efficiency index (SEI), identifying the most promising TgDHFR inhibitors for further investigation. By leveraging AI and data augmentation approach, this study provides a powerful tool for pIC50 predictions of TgDHFR inhibitors, which can be adapted to other systems.
人工智能驱动的弓形虫药物活性预测:有限数据集的数据增强和深度神经网络
由刚地弓形虫(弓形虫)引起的弓形虫病是一个严重的全球卫生问题,特别是在免疫功能低下的个体中。抑制TgDHFR酶是一种很有前途的治疗策略。这项人工智能(AI)驱动的定量构效关系(QSAR)研究应用深度神经网络(dnn)来预测潜在抑制剂的pIC50值,使用2D和3D分子描述符和指纹。为了解决训练数据的局限性,我们引入了一种新的方法,结合了目标描述符选择、基于高斯噪声的数据增强和dnn集成。该方法显著提高了模型性能,将原始数据集的R²从0.75提高到0.85。使用fda批准的两种治疗弓形虫的药物乙胺嘧啶和甲氧苄啶进一步验证了该模型,与实验值相比,pIC50预测的相对误差为3.35 %和2.15 %。最后,在过滤掉与训练数据集特征不一致的分子后,该模型被应用于筛选fda批准的药物。利用预测的pIC50值进一步计算配体效率(LE)、结合效率指数(BEI)、亲脂配体效率(LLE)和表面效率指数(SEI),确定最有希望进行进一步研究的TgDHFR抑制剂。通过利用人工智能和数据增强方法,本研究为TgDHFR抑制剂的pIC50预测提供了一个强大的工具,可以适用于其他系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial intelligence chemistry
Artificial intelligence chemistry Chemistry (General)
自引率
0.00%
发文量
0
审稿时长
21 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信