{"title":"比较利用化学结构和靶向转录组数据预测肝毒性潜力的机器学习方法","authors":"Tia Tate, Grace Patlewicz, Imran Shah","doi":"10.1016/j.comtox.2024.100301","DOIUrl":null,"url":null,"abstract":"<div><p>Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for <em>in vivo</em> testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance <em>in vivo</em> toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data\",\"authors\":\"Tia Tate, Grace Patlewicz, Imran Shah\",\"doi\":\"10.1016/j.comtox.2024.100301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for <em>in vivo</em> testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance <em>in vivo</em> toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.</p></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111324000033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111324000033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
动物毒性测试需要大量时间和资源,因此很难跟上需要评估的物质数量。使用化学结构信息和高通量实验数据的机器学习(ML)模型有助于预测潜在毒性。然而,用于训练 ML 模型的大部分毒性数据都存在偏差,阳性和阴性数据不平衡,这主要是因为被选中进行体内测试的物质预计会引起一些毒性效应。为了研究这种偏差对预测性能的影响,我们采用了各种取样方法来平衡体内毒性数据,作为监督式 ML 工作流程的一部分,以便从化学结构和/或靶向转录组数据中预测肝毒性结果。从至少 50 种阳性物质和 50 种阴性物质的慢性、亚慢性、发育、多代生殖和亚急性重复剂量测试毒性结果中,在多达 7 个 ML 模型中评估了 18 种不同的研究-毒性结果组合。这些模型包括人工神经网络、随机森林、Bernouilli Naïve Bayes、梯度提升和支持向量分类算法,并与一种本地方法--广义读数交叉(GenRA)--相似性加权 k-近邻(k-NN)方法进行了比较。在所有分类器和描述符的非平衡数据中,慢性肝脏效应的平均 CV F1 性能为 0.735(0.0395 SD)。过度取样方法的平均 CV F1 性能降至 0.639(0.073 标差),尽管在某些情况下 KNN 方法的性能较差也导致了观察到的性能下降(不包括 KNN 的平均 CV F1 性能为 0.697(0.072 标差))。采用取样不足法时,平均 CV F1 为 0.523(0.083 标差)。在发育肝效应方面,不平衡方法的平均 CV F1 性能更低,为 0.089(0.111 标差),而采样不足方法的平均 CV F1 性能为 0.149(0.084 标差)。过度取样方法导致发育期肝脏毒性的平均 CV F1 性能提高(0.234,(0.107 标差))。研究发现,模型性能取决于数据集、模型类型、平衡方法和特征选择。因此,在定制预测毒性的 ML 工作流程时应考虑类的不平衡性,并首先依赖简单的分类器。
A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data
Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs