{"title":"BoostDILI:极端梯度boost驱动的药物性肝损伤预测和结构警报生成。","authors":"Hillul Chutia, Gori Sankar Borah, Hridoy Jyoti Mahanta, Selvaraman Nagamani","doi":"10.1021/acs.chemrestox.4c00532","DOIUrl":null,"url":null,"abstract":"<p><p>Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current <i>in vitro</i> and <i>in vivo</i> testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.</p>","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":" ","pages":"865-876"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation.\",\"authors\":\"Hillul Chutia, Gori Sankar Borah, Hridoy Jyoti Mahanta, Selvaraman Nagamani\",\"doi\":\"10.1021/acs.chemrestox.4c00532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current <i>in vitro</i> and <i>in vivo</i> testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.</p>\",\"PeriodicalId\":31,\"journal\":{\"name\":\"Chemical Research in Toxicology\",\"volume\":\" \",\"pages\":\"865-876\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Research in Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.chemrestox.4c00532\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.chemrestox.4c00532","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
期刊介绍:
Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.