Advancing ADMET prediction through multiscale fragment-aware pretraining with MSformer-ADMET.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-08-31 DOI:10.1093/bib/bbaf506

Huihui Liu, Bingjie Zhu, Shuyang Nie, Haoran Li, Yugang Lin, Tianyi Ma, Xin Shao, Qian Chen, Minjie Shen, Yanrong Zheng, Xiaohui Fan, Jie Liao

{"title":"Advancing ADMET prediction through multiscale fragment-aware pretraining with MSformer-ADMET.","authors":"Huihui Liu, Bingjie Zhu, Shuyang Nie, Haoran Li, Yugang Lin, Tianyi Ma, Xin Shao, Qian Chen, Minjie Shen, Yanrong Zheng, Xiaohui Fan, Jie Liao","doi":"10.1093/bib/bbaf506","DOIUrl":null,"url":null,"abstract":"<p><p>Absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are critical determinants of the pharmacokinetic and safety profiles of drug candidates. Accurate and early-stage prediction of ADMET characteristics is essential for reducing late-stage attrition rates, lowering development costs, and accelerating the drug discovery process. Recent advances in deep learning have shown great promise in molecular property prediction, especially with the emergence of Transformer-based architectures that can effectively model long-range dependencies in molecular representations. However, most existing methods rely heavily on atom-level encodings (e.g. smiles or molecular graphs), which often lack structural interpretability and generalization across heterogeneous tasks. Previously, we developed a de novo and flexible molecular representation framework named MSformer (available at https://github.com/ZJUFanLab/MSformer), which demonstrated success in bioactivity prediction. We have now adapted and specialized this architecture for ADMET property prediction. This adapted implementation, designated as MSformer-ADMET, extends the framework's capabilities to pharmacokinetic and toxicity endpoints while maintaining its flexible, fragmentation-based approach to molecular representation learning. MSformer-ADMET is fine-tuned on 22 tasks collected from the Therapeutics Data Commons (TDC), covering both classification and regression settings. Results demonstrate that MSformer-ADMET achieves superior performance across a wide range of ADMET endpoints, consistently outperforming conventional smiles-based and graph-based models. Notably, we further conducted interpretability analyses by leveraging the model's attention distributions and fragment-to-atom mappings, allowing the identification of key structural fragments that are highly associated with molecular properties. This post hoc interpretability provides more transparent insights into the structure-property relationship. Collectively, results demonstrate that MSformer-ADMET is a highly effective and broadly applicable model for ADMET prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12478026/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf506","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are critical determinants of the pharmacokinetic and safety profiles of drug candidates. Accurate and early-stage prediction of ADMET characteristics is essential for reducing late-stage attrition rates, lowering development costs, and accelerating the drug discovery process. Recent advances in deep learning have shown great promise in molecular property prediction, especially with the emergence of Transformer-based architectures that can effectively model long-range dependencies in molecular representations. However, most existing methods rely heavily on atom-level encodings (e.g. smiles or molecular graphs), which often lack structural interpretability and generalization across heterogeneous tasks. Previously, we developed a de novo and flexible molecular representation framework named MSformer (available at https://github.com/ZJUFanLab/MSformer), which demonstrated success in bioactivity prediction. We have now adapted and specialized this architecture for ADMET property prediction. This adapted implementation, designated as MSformer-ADMET, extends the framework's capabilities to pharmacokinetic and toxicity endpoints while maintaining its flexible, fragmentation-based approach to molecular representation learning. MSformer-ADMET is fine-tuned on 22 tasks collected from the Therapeutics Data Commons (TDC), covering both classification and regression settings. Results demonstrate that MSformer-ADMET achieves superior performance across a wide range of ADMET endpoints, consistently outperforming conventional smiles-based and graph-based models. Notably, we further conducted interpretability analyses by leveraging the model's attention distributions and fragment-to-atom mappings, allowing the identification of key structural fragments that are highly associated with molecular properties. This post hoc interpretability provides more transparent insights into the structure-property relationship. Collectively, results demonstrate that MSformer-ADMET is a highly effective and broadly applicable model for ADMET prediction.

查看原文本刊更多论文

基于MSformer-ADMET的多尺度片段感知预训练推进ADMET预测。

吸收、分布、代谢、排泄和毒性（ADMET）特性是候选药物的药代动力学和安全性的关键决定因素。准确和早期预测ADMET特性对于减少后期损耗率、降低开发成本和加速药物发现过程至关重要。深度学习的最新进展在分子性质预测方面显示出巨大的希望，特别是基于transformer的架构的出现，可以有效地模拟分子表示中的远程依赖关系。然而，大多数现有方法严重依赖于原子级编码（例如微笑或分子图），这些方法通常缺乏结构可解释性和跨异构任务的泛化。此前，我们开发了一个名为MSformer（可在https://github.com/ZJUFanLab/MSformer上获得）的全新灵活的分子表示框架，该框架在生物活性预测方面取得了成功。我们现在已经适应并专门为ADMET属性预测这个架构。这个被命名为MSformer-ADMET的改进实现将框架的功能扩展到药代动力学和毒性端点，同时保持其灵活的、基于片段的分子表征学习方法。MSformer-ADMET对从治疗学数据共享（TDC）收集的22项任务进行了微调，包括分类和回归设置。结果表明，MSformer-ADMET在广泛的ADMET端点上取得了卓越的性能，始终优于传统的基于微笑和基于图形的模型。值得注意的是，我们通过利用模型的注意力分布和片段到原子的映射进一步进行了可解释性分析，从而识别出与分子特性高度相关的关键结构片段。这种事后可解释性为结构-属性关系提供了更透明的见解。结果表明，msform -ADMET是一种高效、广泛适用的ADMET预测模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.