Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.

IF 3.8 3区医学 Q2 CHEMISTRY, MEDICINAL

Chemical Research in Toxicology Pub Date : 2025-05-19 Epub Date: 2025-05-02 DOI:10.1021/acs.chemrestox.5c00033

Srijit Seal, Manas Mahale, Miguel García-Ortegón, Chaitanya K Joshi, Layla Hosseini-Gerami, Alex Beatson, Matthew Greenig, Mrinal Shekhar, Arijit Patra, Caroline Weis, Arash Mehrjou, Adrien Badré, Brianna Paisley, Rhiannon Lowe, Shantanu Singh, Falgun Shah, Bjarki Johannesson, Dominic Williams, David Rouquie, Djork-Arné Clevert, Patrick Schwab, Nicola Richmond, Christos A Nicolaou, Raymond J Gonzalez, Russell Naven, Carolin Schramm, Lewis R Vidler, Kamel Mansouri, W Patrick Walters, Deidre Dalmas Wilk, Ola Spjuth, Anne E Carpenter, Andreas Bender

{"title":"Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.","authors":"Srijit Seal, Manas Mahale, Miguel García-Ortegón, Chaitanya K Joshi, Layla Hosseini-Gerami, Alex Beatson, Matthew Greenig, Mrinal Shekhar, Arijit Patra, Caroline Weis, Arash Mehrjou, Adrien Badré, Brianna Paisley, Rhiannon Lowe, Shantanu Singh, Falgun Shah, Bjarki Johannesson, Dominic Williams, David Rouquie, Djork-Arné Clevert, Patrick Schwab, Nicola Richmond, Christos A Nicolaou, Raymond J Gonzalez, Russell Naven, Carolin Schramm, Lewis R Vidler, Kamel Mansouri, W Patrick Walters, Deidre Dalmas Wilk, Ola Spjuth, Anne E Carpenter, Andreas Bender","doi":"10.1021/acs.chemrestox.5c00033","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":" ","pages":"759-807"},"PeriodicalIF":3.8000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093382/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.chemrestox.5c00033","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.

查看原文本刊更多论文

使用化学结构进行毒性预测的机器学习：在现实世界中成功的支柱。

机器学习（ML）在预测药物发现中的分子特性和毒性方面越来越有价值。然而，由于人类和动物研究需要资源，毒性相关终点在体内翻译方面的实验评估一直具有挑战性；这影响了现场的数据可用性。根据项目阶段和预测的具体目标，机器学习可以增强甚至潜在地取代传统的实验过程。例如，模型可用于选择有希望实现靶效应的化合物，或取消选择那些具有不希望的特性（例如，由于不利的药代动力学而脱靶或无效）的化合物。然而，依赖机器学习并非没有风险，因为非代表性的训练数据、不兼容的算法选择来表示底层数据，或者糟糕的模型构建和验证方法会产生偏差。这可能会导致不准确的预测，对ML预测信心的误解，最终导致次优决策。因此，了解ML模型的预测有效性对于加快药物开发时间表，同时提高决策质量至关重要。这一观点强调需要加强对机器学习模型在药物发现中的理解和应用，重点是基于小分子结构的毒性预测的定义良好的数据集。我们专注于机器学习驱动的分子特性和毒性预测成功的五个关键支柱：(1)数据集选择，(2)结构表示，(3)模型算法，(4)模型验证，以及(5)将预测转化为决策。了解这些关键支柱将促进ML研究人员和毒理学家之间的合作和协调，这将有助于推进药物发现和开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chemical Research in Toxicology 医学-毒理学

CiteScore

7.90

自引率

7.30%

发文量

215

审稿时长

3.5 months

期刊介绍： Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.