BoostDILI：极端梯度boost驱动的药物性肝损伤预测和结构警报生成。

IF 3.7 3区医学 Q2 CHEMISTRY, MEDICINAL

Chemical Research in Toxicology Pub Date : 2025-05-19 Epub Date: 2025-04-16 DOI:10.1021/acs.chemrestox.4c00532

Hillul Chutia, Gori Sankar Borah, Hridoy Jyoti Mahanta, Selvaraman Nagamani

{"title":"BoostDILI：极端梯度boost驱动的药物性肝损伤预测和结构警报生成。","authors":"Hillul Chutia, Gori Sankar Borah, Hridoy Jyoti Mahanta, Selvaraman Nagamani","doi":"10.1021/acs.chemrestox.4c00532","DOIUrl":null,"url":null,"abstract":"Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":" ","pages":"865-876"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation.\",\"authors\":\"Hillul Chutia, Gori Sankar Borah, Hridoy Jyoti Mahanta, Selvaraman Nagamani\",\"doi\":\"10.1021/acs.chemrestox.4c00532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.\",\"PeriodicalId\":31,\"journal\":{\"name\":\"Chemical Research in Toxicology\",\"volume\":\" \",\"pages\":\"865-876\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Research in Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.chemrestox.4c00532\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.chemrestox.4c00532","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

在过去的60年里，药物性肝损伤（DILI）在由于安全性问题而导致的上市药物下售中发挥了关键作用。DILI的早期预测对于开发更安全的药物至关重要，但目前的体外和体内测试方法复杂而繁琐。在这项研究中，我们开发了一个用于DILI预测的极端梯度增强（XGB）驱动的机器学习（ML）模型。比较各种DILI预测模型具有挑战性，因为它们依赖于不同的公共数据集。我们全面评估了提出的BoostDILI模型，以解决两个关键问题：1。从公共数据集获得的见解是否有助于FDA批准药物的DILI预测？2. 我们能否生成结构性警报来提高模型的可解释性？为了解决第一个问题，我们使用四个公开可用的数据集开发了一个DILI预测模型。这一努力导致了BoostDILI模型的创建，该模型实现了0.70的5倍CV精度。采用序列特征选择方法识别相关描述符。该模型集成了来自RDKit（12个特征）和Mordred（23个特征）特征的特征级表示。应用贝叶斯统计迭代识别高性能子结构，并建立了结构报警模型来解决第二个问题。开发的模型通过两个fda批准的药物数据集DILIst和DILIRank进一步验证。BoostDILI模型为临床前研究中的DILI风险评估提供了可靠的解决方案。结构警报有助于识别可能导致DILI的子结构。数据集和源代码可在https://github.com/Naga270588/BoostDILI上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation.

Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemical Research in Toxicology 医学-毒理学

CiteScore

7.90

自引率

7.30%

发文量

215

审稿时长

3.5 months

期刊介绍： Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.