ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning

IF 6.3 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-02-01 DOI:10.1016/j.compbiomed.2024.109480

Surapong Boonsom , Panisara Chamnansil , Sarote Boonseng , Tarapong Srisongkram

{"title":"ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning","authors":"Surapong Boonsom , Panisara Chamnansil , Sarote Boonseng , Tarapong Srisongkram","doi":"10.1016/j.compbiomed.2024.109480","DOIUrl":null,"url":null,"abstract":"<div><div>Drug registration requires risk assessment of new active pharmaceutical ingredients or excipients to ensure they are safe for human health and the environment. However, traditional risk assessment is expensive and relies heavily on animal testing. Machine learning (ML) has been used as a risk assessment tool, providing less time, money, and involved animals than <em>in vivo</em> experiments. Despite that, the ML models often rely on a single model, which may introduce bias and unreliable prediction. Stacking ensemble learning is an ML framework that makes predictions based on multimodal outcomes. This framework performs well in quantitative structure-activity relationship (QSAR) studies. In this study, we developed ToxSTK, a multi-target toxicity assessment using stacking ensemble learning. We aimed to create an ML tool that facilitates toxicity assessments more affordably with reduced reliance on animal models. We focused on four key targets generally assessed in early-stage drug development: hERG toxicity, mTOR toxicity, PBMCs toxicity, and mutagenicity. Our model integrated 12 molecular fingerprints with 3 ML algorithms, generating 36 novel predictive features (PFs). These PFs were then combined to construct the final meta-decision model. Our results demonstrated that the ToxSTK model surpasses standard regression and classification metrics, ensuring it is highly reliable and accurate in predicting chemical toxicities within its application domain. This model passed the y-randomization test, confirming that the identified QSAR is robust and not due to random chance. Additionally, this model outperforms the existing ML methods for these endpoints, suggesting its effectiveness for risk assessment applications. We recommend incorporating this stacking ensemble learning framework into the chemical risk assessment pipeline to improve model generalization, accuracy, robustness, and reliability.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"185 ","pages":"Article 109480"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524015658","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Drug registration requires risk assessment of new active pharmaceutical ingredients or excipients to ensure they are safe for human health and the environment. However, traditional risk assessment is expensive and relies heavily on animal testing. Machine learning (ML) has been used as a risk assessment tool, providing less time, money, and involved animals than in vivo experiments. Despite that, the ML models often rely on a single model, which may introduce bias and unreliable prediction. Stacking ensemble learning is an ML framework that makes predictions based on multimodal outcomes. This framework performs well in quantitative structure-activity relationship (QSAR) studies. In this study, we developed ToxSTK, a multi-target toxicity assessment using stacking ensemble learning. We aimed to create an ML tool that facilitates toxicity assessments more affordably with reduced reliance on animal models. We focused on four key targets generally assessed in early-stage drug development: hERG toxicity, mTOR toxicity, PBMCs toxicity, and mutagenicity. Our model integrated 12 molecular fingerprints with 3 ML algorithms, generating 36 novel predictive features (PFs). These PFs were then combined to construct the final meta-decision model. Our results demonstrated that the ToxSTK model surpasses standard regression and classification metrics, ensuring it is highly reliable and accurate in predicting chemical toxicities within its application domain. This model passed the y-randomization test, confirming that the identified QSAR is robust and not due to random chance. Additionally, this model outperforms the existing ML methods for these endpoints, suggesting its effectiveness for risk assessment applications. We recommend incorporating this stacking ensemble learning framework into the chemical risk assessment pipeline to improve model generalization, accuracy, robustness, and reliability.

Abstract Image

查看原文本刊更多论文

ToxSTK：利用分子结构和堆叠集成学习的多靶点毒性评估。

药品注册要求对新的活性药物成分或辅料进行风险评估，以确保它们对人类健康和环境安全。然而，传统的风险评估成本高昂，且严重依赖动物试验。机器学习（ML）已被用作一种风险评估工具，与活体实验相比，它节省了时间、金钱和动物。尽管如此，ML 模型往往依赖于单一模型，这可能会带来偏差和不可靠的预测。堆叠集合学习是一种基于多模态结果进行预测的 ML 框架。该框架在定量结构-活性关系（QSAR）研究中表现出色。在本研究中，我们利用堆叠集合学习开发了多目标毒性评估工具 ToxSTK。我们的目标是创建一种 ML 工具，以更低的成本促进毒性评估，同时减少对动物模型的依赖。我们重点研究了早期药物开发中通常要评估的四个关键靶点：hERG毒性、mTOR毒性、PBMCs毒性和致突变性。我们的模型将 12 个分子指纹与 3 种 ML 算法相结合，生成了 36 个新的预测特征 (PF)。然后将这些 PFs 结合起来，构建出最终的元决策模型。我们的研究结果表明，ToxSTK 模型超越了标准回归和分类指标，确保了它在应用领域内预测化学毒性时的高度可靠性和准确性。该模型通过了 y 随机化测试，证实了所识别的 QSAR 是稳健的，而不是随机的。此外，该模型在这些终点方面的表现优于现有的 ML 方法，表明其在风险评估应用中的有效性。我们建议将这种堆叠集合学习框架纳入化学品风险评估管道，以提高模型的泛化、准确性、稳健性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.