Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood
{"title":"TabNet和TabTransformer:用于化学毒性预测的新型深度学习模型与机器学习的比较。","authors":"Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood","doi":"10.1002/jat.4803","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.</p>\n </div>","PeriodicalId":15242,"journal":{"name":"Journal of Applied Toxicology","volume":"45 9","pages":"1730-1749"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TabNet and TabTransformer: Novel Deep Learning Models for Chemical Toxicity Prediction in Comparison With Machine Learning\",\"authors\":\"Firas Mahmood Mustafa, Ali Fawzi Al-Hussainy, Hardik Doshi, Anupam Yadav, M. M. Rekha, Mayank Kundlas, A. Sabarivani, Aziz Kubaev, Sada Ghalib Taher, Mariem Alwan, Mahmood Jawad, Hiba Mushtaq, Bagher Farhood\",\"doi\":\"10.1002/jat.4803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.</p>\\n </div>\",\"PeriodicalId\":15242,\"journal\":{\"name\":\"Journal of Applied Toxicology\",\"volume\":\"45 9\",\"pages\":\"1730-1749\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jat.4803\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Toxicology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jat.4803","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TOXICOLOGY","Score":null,"Total":0}
TabNet and TabTransformer: Novel Deep Learning Models for Chemical Toxicity Prediction in Comparison With Machine Learning
The prediction of chemical toxicity is crucial for applications in drug discovery, environmental safety, and regulatory assessments. This study aims to evaluate the performance of advanced deep learning architectures, TabNet and TabTransformer, in comparison to traditional machine learning methods, for predicting the toxicity of chemical compounds across 12 toxicological endpoints. The dataset consisted of 12,228 training and 3057 test samples, each characterized by 801 molecular descriptors representing chemical and structural features. Traditional machine learning models, including XGBoost, CatBoost, SVM, and a voting classifier, were paired with feature selection techniques such as principal component analysis (PCA), recursive feature elimination (RFE), and mutual information (MI). Advanced architectures, TabNet and TabTransformer, were trained directly on the full feature set without dimensionality reduction. Model performance was assessed using accuracy, F1-score, AUC-ROC, AUPR, and Matthews correlation coefficient (MCC), alongside SHAP analysis to interpret feature importance and enhance model transparency under class imbalance conditions. Cross-validation and test set evaluations ensured robust comparisons across all models and toxicological endpoints. TabNet and TabTransformer consistently outperformed traditional classifiers, achieving AUC-ROC values up to 96% for endpoints such as SR.ARE and SR.p53. TabTransformer showed the highest performance on complex labels, benefiting from self-attention mechanisms that captured intricate feature relationships, while TabNet achieved competitive outcomes with an efficient, dynamic feature selection. In addition to standard metrics, we reported AUPR and MCC to better evaluate model performance under class imbalance, with both models maintaining high scores across endpoints. Although traditional classifiers, particularly the voting classifier, performed well when combined with feature selection—achieving up to 94% AUC-ROC on SR.p53—they lagged behind the deep learning models in generalizability and feature interaction modeling. SHAP analysis further highlighted the interpretability of the proposed architectures by identifying influential descriptors such as VSAEstate6 and MoRSEE8. This study highlights the superiority of TabNet and TabTransformer in predicting chemical toxicity while ensuring interpretability through SHAP analysis. These models offer a promising alternative to traditional in vitro and in vivo approaches, paving the way for cost-effective and ethical toxicity assessments.
期刊介绍:
Journal of Applied Toxicology publishes peer-reviewed original reviews and hypothesis-driven research articles on mechanistic, fundamental and applied research relating to the toxicity of drugs and chemicals at the molecular, cellular, tissue, target organ and whole body level in vivo (by all relevant routes of exposure) and in vitro / ex vivo. All aspects of toxicology are covered (including but not limited to nanotoxicology, genomics and proteomics, teratogenesis, carcinogenesis, mutagenesis, reproductive and endocrine toxicology, toxicopathology, target organ toxicity, systems toxicity (eg immunotoxicity), neurobehavioral toxicology, mechanistic studies, biochemical and molecular toxicology, novel biomarkers, pharmacokinetics/PBPK, risk assessment and environmental health studies) and emphasis is given to papers of clear application to human health, and/or advance mechanistic understanding and/or provide significant contributions and impact to their field.