TabNet和TabTransformer：用于化学毒性预测的新型深度学习模型与机器学习的比较。

IF 2.8 4区医学 Q3 TOXICOLOGY

Journal of Applied Toxicology Pub Date : 2025-05-01 DOI:10.1002/jat.4803

Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood

{"title":"TabNet和TabTransformer：用于化学毒性预测的新型深度学习模型与机器学习的比较。","authors":"Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood","doi":"10.1002/jat.4803","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.</p>\n </div>","PeriodicalId":15242,"journal":{"name":"Journal of Applied Toxicology","volume":"45 9","pages":"1730-1749"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TabNet and TabTransformer: Novel Deep Learning Models for Chemical Toxicity Prediction in Comparison With Machine Learning\",\"authors\":\"Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood\",\"doi\":\"10.1002/jat.4803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.</p>\\n </div>\",\"PeriodicalId\":15242,\"journal\":{\"name\":\"Journal of Applied Toxicology\",\"volume\":\"45 9\",\"pages\":\"1730-1749\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jat.4803\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Toxicology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jat.4803","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

化学毒性的预测对于药物发现、环境安全和监管评估的应用至关重要。本研究旨在评估先进的深度学习架构TabNet和TabTransformer的性能，并与传统的机器学习方法进行比较，以预测化合物在12个毒理学终点上的毒性。该数据集由12228个训练样本和3057个测试样本组成，每个样本由801个代表化学和结构特征的分子描述符表征。传统的机器学习模型，包括XGBoost、CatBoost、SVM和投票分类器，与特征选择技术（如主成分分析（PCA）、递归特征消除（RFE）和互信息（MI））配对。先进的架构，TabNet和TabTransformer，直接在完整的特征集上进行训练，没有降维。通过准确性、f1评分、AUC-ROC、AUPR和Matthews相关系数（MCC）评估模型性能，并结合SHAP分析来解释类别失衡条件下的特征重要性并提高模型透明度。交叉验证和测试集评估确保了所有模型和毒理学终点之间的可靠比较。TabNet和TabTransformer始终优于传统分类器，在SR.ARE和SR.p53等端点上实现了高达96%的AUC-ROC值。TabTransformer在复杂标签上表现出最高的性能，得益于捕捉复杂特征关系的自关注机制，而TabNet通过高效、动态的特征选择实现了竞争性结果。除了标准指标外，我们报告了AUPR和MCC，以更好地评估类别不平衡下的模型性能，两个模型在端点上都保持高分。尽管传统分类器，特别是投票分类器，在与特征选择结合时表现良好——在sr .p53上达到高达94%的AUC-ROC——但它们在泛化性和特征交互建模方面落后于深度学习模型。SHAP分析通过识别有影响力的描述符（如VSAEstate6和MoRSEE8）进一步强调了所提议架构的可解释性。本研究强调了TabNet和TabTransformer在预测化学毒性方面的优势，同时确保了通过SHAP分析的可解释性。这些模型为传统的体外和体内方法提供了一种有希望的替代方法，为成本效益和伦理毒性评估铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TabNet and TabTransformer: Novel Deep Learning Models for Chemical Toxicity Prediction in Comparison With Machine Learning

The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Applied Toxicology 医学-毒理学

CiteScore

7.00

自引率

6.10%

发文量

145

审稿时长

1 months

期刊介绍： Journal of Applied Toxicology publishes peer-reviewed original reviews and hypothesis-driven research articles on mechanistic, fundamental and applied research relating to the toxicity of drugs and chemicals at the molecular, cellular, tissue, target organ and whole body level in vivo (by all relevant routes of exposure) and in vitro / ex vivo. All aspects of toxicology are covered (including but not limited to nanotoxicology, genomics and proteomics, teratogenesis, carcinogenesis, mutagenesis, reproductive and endocrine toxicology, toxicopathology, target organ toxicity, systems toxicity (eg immunotoxicity), neurobehavioral toxicology, mechanistic studies, biochemical and molecular toxicology, novel biomarkers, pharmacokinetics/PBPK, risk assessment and environmental health studies) and emphasis is given to papers of clear application to human health, and/or advance mechanistic understanding and/or provide significant contributions and impact to their field.