DICTrank Is a Reliable Dataset for Cardiotoxicity Prediction Using Machine Learning Methods.

IF 3.7 3区医学 Q2 CHEMISTRY, MEDICINAL

Chemical Research in Toxicology Pub Date : 2025-03-27 DOI:10.1021/acs.chemrestox.4c00428

Yanyan Qu, Ting Li, Zhichao Liu, Weida Tong, Dongying Li

{"title":"DICTrank Is a Reliable Dataset for Cardiotoxicity Prediction Using Machine Learning Methods.","authors":"Yanyan Qu, Ting Li, Zhichao Liu, Weida Tong, Dongying Li","doi":"10.1021/acs.chemrestox.4c00428","DOIUrl":null,"url":null,"abstract":"<p><p>Drug-induced cardiotoxicity (DICT) is a significant challenge in drug development and public health. DICT can arise from various mechanisms; New Approach Methods (NAMs), including quantitative structure-activity relationships (QSARs), have been extensively developed to predict DICT based solely on individual mechanisms (e.g., hERG-related cardiotoxicity) due to the availability of datasets limited to specific mechanisms. While these efforts have significantly contributed to our understanding of cardiotoxicity, DICT assessment remains challenging, suggesting that approaches focusing on isolated mechanisms may not provide a comprehensive evaluation. To address this, we previously developed DICTrank, the largest dataset for assessing overall cardiotoxicity liability in humans based on FDA drug labels. In this study, we evaluated the utility of DICTrank for QSAR modeling using five machine learning methods─Logistic Regression (LR), K-Nearest Neighbors, Support Vector Machines, Random Forest (RF), and extreme gradient boosting (XGBoost)─which vary in algorithmic complexity and explainability. To reflect real-world scenarios, models were trained on drugs approved before and within 2005 to predict the DICT risk of those approved thereafter. While we observed no clear association between prediction performance and model complexity, LR and XGBoost achieved the best results with DICTrank. Additionally, our significant-feature analyses with RF and XGBoost models provided novel insights into DICT mechanisms, revealing that drug properties associated with descriptors such as \"structural and topological\", \"polarizability\", and \"electronegativity\" contributed significantly to DICT. Moreover, we found that model performance varied by therapeutic category, suggesting the need to tailor models accordingly. In conclusion, our study demonstrated the robustness and reliability of DICTrank for cardiotoxicity prediction in humans using machine learning methods.</p>","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":" ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.chemrestox.4c00428","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Drug-induced cardiotoxicity (DICT) is a significant challenge in drug development and public health. DICT can arise from various mechanisms; New Approach Methods (NAMs), including quantitative structure-activity relationships (QSARs), have been extensively developed to predict DICT based solely on individual mechanisms (e.g., hERG-related cardiotoxicity) due to the availability of datasets limited to specific mechanisms. While these efforts have significantly contributed to our understanding of cardiotoxicity, DICT assessment remains challenging, suggesting that approaches focusing on isolated mechanisms may not provide a comprehensive evaluation. To address this, we previously developed DICTrank, the largest dataset for assessing overall cardiotoxicity liability in humans based on FDA drug labels. In this study, we evaluated the utility of DICTrank for QSAR modeling using five machine learning methods─Logistic Regression (LR), K-Nearest Neighbors, Support Vector Machines, Random Forest (RF), and extreme gradient boosting (XGBoost)─which vary in algorithmic complexity and explainability. To reflect real-world scenarios, models were trained on drugs approved before and within 2005 to predict the DICT risk of those approved thereafter. While we observed no clear association between prediction performance and model complexity, LR and XGBoost achieved the best results with DICTrank. Additionally, our significant-feature analyses with RF and XGBoost models provided novel insights into DICT mechanisms, revealing that drug properties associated with descriptors such as "structural and topological", "polarizability", and "electronegativity" contributed significantly to DICT. Moreover, we found that model performance varied by therapeutic category, suggesting the need to tailor models accordingly. In conclusion, our study demonstrated the robustness and reliability of DICTrank for cardiotoxicity prediction in humans using machine learning methods.

查看原文本刊更多论文

DICTrank是使用机器学习方法进行心脏毒性预测的可靠数据集。

药物性心脏毒性（DICT）是药物开发和公共卫生领域的一个重大挑战。DICT可以由各种机制产生；新方法方法（NAMs），包括定量结构-活性关系（qsar），由于数据集的可用性仅限于特定机制，已被广泛开发用于仅基于个体机制（例如，herg相关的心脏毒性）预测DICT。虽然这些努力极大地促进了我们对心脏毒性的理解，但DICT评估仍然具有挑战性，这表明专注于孤立机制的方法可能无法提供全面的评估。为了解决这个问题，我们之前开发了DICTrank，这是基于FDA药物标签评估人类总体心脏毒性的最大数据集。在这项研究中，我们使用五种机器学习方法──逻辑回归（LR）、k近邻、支持向量机、随机森林（RF）和极端梯度增强（XGBoost）──评估了DICTrank对QSAR建模的效用，这些方法在算法复杂性和可解释性方面各不相同。为了反映真实情况，我们对2005年之前和2005年之内批准的药物进行了模型训练，以预测2005年之后批准的药物的DICT风险。虽然我们观察到预测性能和模型复杂性之间没有明显的关联，但LR和XGBoost在DICTrank上取得了最好的结果。此外，我们使用RF和XGBoost模型进行的重要特征分析为DICT机制提供了新的见解，揭示了与“结构和拓扑”、“极化性”和“电负性”等描述符相关的药物特性对DICT有重要影响。此外，我们发现模型的表现因治疗类别而异，这表明需要相应地定制模型。总之，我们的研究证明了DICTrank在使用机器学习方法预测人类心脏毒性方面的稳健性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chemical Research in Toxicology 医学-毒理学

CiteScore

7.90

自引率

7.30%

发文量

215

审稿时长

3.5 months

期刊介绍： Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.