利用深度学习提高细胞色素P450抑制小数据集预测模型的准确性

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Elpri Eka Permadi, Reiko Watanabe, Kenji Mizuguchi
{"title":"利用深度学习提高细胞色素P450抑制小数据集预测模型的准确性","authors":"Elpri Eka Permadi,&nbsp;Reiko Watanabe,&nbsp;Kenji Mizuguchi","doi":"10.1186/s13321-025-01015-2","DOIUrl":null,"url":null,"abstract":"<div><p>The cytochrome P450 (CYP) superfamily metabolises a wide range of compounds; however, drug-induced CYP inhibition can lead to adverse interactions. Identifying potential CYP inhibitors is crucial for safe drug administration. This study investigated the application of deep learning techniques to the prediction of CYP inhibition, focusing on the challenges posed by limited datasets for CYP2B6 and CYP2C8 isoforms. To tackle these limitations, we leveraged larger datasets for related CYP isoforms, compiling comprehensive data from public databases containing IC50 values for 12,369 compounds that target seven CYP isoforms. We constructed single-task, fine-tuning, multitask, and multitask models incorporating data imputation on the missing values. Notably, the multitask models with data imputation demonstrated significant improvement in CYP inhibition prediction over the single-task models. Using the most accurate prediction models, we evaluated the inhibitory activity of approved drugs against CYP2B6 and CYP2C8. Among the 1,808 approved drugs analysed, our multitask models with data imputation identified 161 and 154 potential inhibitors of CYP2B6 and CYP2C8, respectively. This study underscores the significant potential of multitask deep learning, particularly when utilising a graph convolutional network with data imputation, to enhance the accuracy of CYP inhibition predictions under the conditions of limited data availability.</p><p><b>Scientific contribution</b></p><p>This study demonstrates that even with small datasets, accurate prediction models can be constructed by utilising related data effectively. Also, our imputation techniques on the missing values improved the prediction accuracy of CYP2B6 and CYP2C8 inhibition significantly.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01015-2","citationCount":"0","resultStr":"{\"title\":\"Improving the accuracy of prediction models for small datasets of Cytochrome P450 inhibition with deep learning\",\"authors\":\"Elpri Eka Permadi,&nbsp;Reiko Watanabe,&nbsp;Kenji Mizuguchi\",\"doi\":\"10.1186/s13321-025-01015-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The cytochrome P450 (CYP) superfamily metabolises a wide range of compounds; however, drug-induced CYP inhibition can lead to adverse interactions. Identifying potential CYP inhibitors is crucial for safe drug administration. This study investigated the application of deep learning techniques to the prediction of CYP inhibition, focusing on the challenges posed by limited datasets for CYP2B6 and CYP2C8 isoforms. To tackle these limitations, we leveraged larger datasets for related CYP isoforms, compiling comprehensive data from public databases containing IC50 values for 12,369 compounds that target seven CYP isoforms. We constructed single-task, fine-tuning, multitask, and multitask models incorporating data imputation on the missing values. Notably, the multitask models with data imputation demonstrated significant improvement in CYP inhibition prediction over the single-task models. Using the most accurate prediction models, we evaluated the inhibitory activity of approved drugs against CYP2B6 and CYP2C8. Among the 1,808 approved drugs analysed, our multitask models with data imputation identified 161 and 154 potential inhibitors of CYP2B6 and CYP2C8, respectively. This study underscores the significant potential of multitask deep learning, particularly when utilising a graph convolutional network with data imputation, to enhance the accuracy of CYP inhibition predictions under the conditions of limited data availability.</p><p><b>Scientific contribution</b></p><p>This study demonstrates that even with small datasets, accurate prediction models can be constructed by utilising related data effectively. Also, our imputation techniques on the missing values improved the prediction accuracy of CYP2B6 and CYP2C8 inhibition significantly.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01015-2\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01015-2\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01015-2","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

细胞色素P450 (CYP)超家族代谢多种化合物;然而,药物诱导的CYP抑制可导致不良相互作用。确定潜在的CYP抑制剂对安全给药至关重要。本研究探讨了深度学习技术在CYP2B6和CYP2C8亚型预测中的应用,重点关注CYP2B6和CYP2C8亚型有限数据集带来的挑战。为了解决这些限制,我们利用了相关CYP异构体的更大数据集,从公共数据库中收集了包含针对7种CYP异构体的12,369种化合物的IC50值的综合数据。我们构建了单任务、微调、多任务和多任务模型,并对缺失值进行了数据输入。值得注意的是,与单任务模型相比,具有数据输入的多任务模型在CYP抑制预测方面表现出显著改善。使用最准确的预测模型,我们评估了批准的药物对CYP2B6和CYP2C8的抑制活性。在分析的1808种获批药物中,我们的多任务模型与数据输入分别鉴定出161种和154种CYP2B6和CYP2C8的潜在抑制剂。这项研究强调了多任务深度学习的巨大潜力,特别是当使用带有数据输入的图卷积网络时,可以在有限的数据可用性条件下提高CYP抑制预测的准确性。科学贡献本研究表明,即使在小数据集上,通过有效利用相关数据也可以构建准确的预测模型。此外,我们对缺失值的代入技术显著提高了CYP2B6和CYP2C8抑制的预测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving the accuracy of prediction models for small datasets of Cytochrome P450 inhibition with deep learning

The cytochrome P450 (CYP) superfamily metabolises a wide range of compounds; however, drug-induced CYP inhibition can lead to adverse interactions. Identifying potential CYP inhibitors is crucial for safe drug administration. This study investigated the application of deep learning techniques to the prediction of CYP inhibition, focusing on the challenges posed by limited datasets for CYP2B6 and CYP2C8 isoforms. To tackle these limitations, we leveraged larger datasets for related CYP isoforms, compiling comprehensive data from public databases containing IC50 values for 12,369 compounds that target seven CYP isoforms. We constructed single-task, fine-tuning, multitask, and multitask models incorporating data imputation on the missing values. Notably, the multitask models with data imputation demonstrated significant improvement in CYP inhibition prediction over the single-task models. Using the most accurate prediction models, we evaluated the inhibitory activity of approved drugs against CYP2B6 and CYP2C8. Among the 1,808 approved drugs analysed, our multitask models with data imputation identified 161 and 154 potential inhibitors of CYP2B6 and CYP2C8, respectively. This study underscores the significant potential of multitask deep learning, particularly when utilising a graph convolutional network with data imputation, to enhance the accuracy of CYP inhibition predictions under the conditions of limited data availability.

Scientific contribution

This study demonstrates that even with small datasets, accurate prediction models can be constructed by utilising related data effectively. Also, our imputation techniques on the missing values improved the prediction accuracy of CYP2B6 and CYP2C8 inhibition significantly.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信