Prediction of UGT-mediated phase II metabolism via ligand- and structure-based predictive models

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2025-10-15 DOI:10.1186/s13321-025-01097-y

Ludovica Bono, Filippo Lunghini, Emanuela Sabato, Akash Deep Biswas, Angelica Mazzolari, Alessandro Pedretti, Andrea R. Beccari, Giulio Vistoli, Serena Vittorio

{"title":"Prediction of UGT-mediated phase II metabolism via ligand- and structure-based predictive models","authors":"Ludovica Bono, Filippo Lunghini, Emanuela Sabato, Akash Deep Biswas, Angelica Mazzolari, Alessandro Pedretti, Andrea R. Beccari, Giulio Vistoli, Serena Vittorio","doi":"10.1186/s13321-025-01097-y","DOIUrl":null,"url":null,"abstract":"The prediction of drugs metabolism by in silico techniques is gaining a growing interest due to the possibility to process large datasets allowing the stability and safety of new drug candidates to be evaluated during the early stages of the drug discovery process. To date, in silico models for metabolism prediction mainly exploits the ligand-based (LB) properties of the training molecules to predict the occurrence of a given metabolic reaction and/or the reactive site involved in the biotransformation. However, recent reports highlighted that structure-based (SB) modeling can be conveniently integrated with LB methods for drug metabolism prediction purpose, with the advantages to predict if a given molecule can fit the enzyme active site and which moiety approaches the catalytic residues. Herein, we developed machine learning models for UDP-glucuronosyltransferase (UGT)-mediated metabolism by using both LB and SB methods. In particular, this study was focused on UGT2B7 and UGT2B15 isoforms which are involved in the clearance of many drugs as well as in clinically relevant drug-drug interactions. First, molecular dynamics (MD) and docking simulations were combined to explore the binding mechanism of cofactor and substrate within the catalytic pocket of the studied UGT isoforms exploiting their AlphaFold structures. The analysis of the MD trajectories allowed an appropriate conformation of both UGT isoforms to be identified for the development of binary classification models. For this purpose, Random Forest algorithm and the metabolic data extracted from the MetaQSAR database were used. SB models were trained on a set of scoring functions and protein–ligand interaction fingerprints derived from docking, while the LB models were built on a set of physicochemical and constitutional descriptors. When the single models were evaluated, the LB classifiers outperformed the SB models. However, the application of a consensus strategy led to an improvement of the prediction accuracy if compared to the individual models, highlighting that LB and SB approaches convey complementary information whose aggregation allowed us to achieve better predictions than the single models. Metabolism prediction through in silico methods represents a useful tool to assess the pharmacokinetic profile of new drug candidates in the early stages of drug discovery. This study provides a new computational strategy to integrate ligand- and structure-based approaches for the prediction of UGT2B7 and UGT2B15-mediated metabolism exploiting their AlphaFold structures. The combination of both methodologies yielded enhanced performances in comparison to the individual ligand- and structure-based predictive models, also confirming the reliability of AlphaFold structures for developing structure-based models for metabolism prediction.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"1 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1186/s13321-025-01097-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The prediction of drugs metabolism by in silico techniques is gaining a growing interest due to the possibility to process large datasets allowing the stability and safety of new drug candidates to be evaluated during the early stages of the drug discovery process. To date, in silico models for metabolism prediction mainly exploits the ligand-based (LB) properties of the training molecules to predict the occurrence of a given metabolic reaction and/or the reactive site involved in the biotransformation. However, recent reports highlighted that structure-based (SB) modeling can be conveniently integrated with LB methods for drug metabolism prediction purpose, with the advantages to predict if a given molecule can fit the enzyme active site and which moiety approaches the catalytic residues. Herein, we developed machine learning models for UDP-glucuronosyltransferase (UGT)-mediated metabolism by using both LB and SB methods. In particular, this study was focused on UGT2B7 and UGT2B15 isoforms which are involved in the clearance of many drugs as well as in clinically relevant drug-drug interactions. First, molecular dynamics (MD) and docking simulations were combined to explore the binding mechanism of cofactor and substrate within the catalytic pocket of the studied UGT isoforms exploiting their AlphaFold structures. The analysis of the MD trajectories allowed an appropriate conformation of both UGT isoforms to be identified for the development of binary classification models. For this purpose, Random Forest algorithm and the metabolic data extracted from the MetaQSAR database were used. SB models were trained on a set of scoring functions and protein–ligand interaction fingerprints derived from docking, while the LB models were built on a set of physicochemical and constitutional descriptors. When the single models were evaluated, the LB classifiers outperformed the SB models. However, the application of a consensus strategy led to an improvement of the prediction accuracy if compared to the individual models, highlighting that LB and SB approaches convey complementary information whose aggregation allowed us to achieve better predictions than the single models. Metabolism prediction through in silico methods represents a useful tool to assess the pharmacokinetic profile of new drug candidates in the early stages of drug discovery. This study provides a new computational strategy to integrate ligand- and structure-based approaches for the prediction of UGT2B7 and UGT2B15-mediated metabolism exploiting their AlphaFold structures. The combination of both methodologies yielded enhanced performances in comparison to the individual ligand- and structure-based predictive models, also confirming the reliability of AlphaFold structures for developing structure-based models for metabolism prediction.

查看原文本刊更多论文

通过基于配体和结构的预测模型预测ugt介导的II期代谢

由于处理大型数据集的可能性，在药物发现过程的早期阶段可以评估新候选药物的稳定性和安全性，因此通过硅技术预测药物代谢正在获得越来越多的兴趣。迄今为止，代谢预测的计算机模型主要利用训练分子的基于配体（LB）的特性来预测给定代谢反应的发生和/或生物转化中涉及的反应位点。然而，最近的报道强调，基于结构（SB）的建模可以方便地与LB方法相结合，用于药物代谢预测，其优点是预测给定分子是否适合酶的活性位点以及哪些部分接近催化残基。在此，我们利用LB和SB方法建立了udp -葡萄糖醛酸转移酶（UGT）介导代谢的机器学习模型。本研究特别关注UGT2B7和UGT2B15亚型，它们参与了许多药物的清除以及临床相关的药物-药物相互作用。首先，将分子动力学（MD）和对接模拟相结合，利用所研究的UGT同工型的AlphaFold结构，探索其催化口袋内辅因子和底物的结合机制。MD轨迹的分析允许两个UGT同种异构体的适当构象被确定为二元分类模型的发展。为此，我们使用随机森林算法和MetaQSAR数据库中提取的代谢数据。SB模型是基于一组评分函数和对接得到的蛋白质-配体相互作用指纹进行训练的，而LB模型是基于一组物理化学和结构描述符进行训练的。当评估单个模型时，LB分类器优于SB模型。然而，与单个模型相比，共识策略的应用导致了预测精度的提高，突出表明LB和SB方法传达了互补信息，其聚合使我们能够实现比单个模型更好的预测。通过计算机方法进行代谢预测是在药物发现的早期阶段评估新候选药物的药代动力学特征的有用工具。该研究提供了一种新的计算策略，可以整合基于配体和结构的方法，利用UGT2B7和ugt2b15的AlphaFold结构来预测其介导的代谢。与单个配体和基于结构的预测模型相比，这两种方法的结合产生了更高的性能，也证实了AlphaFold结构在开发基于结构的代谢预测模型方面的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.