The state-of-the-art machine learning model for Plasma Protein Binding Prediction: computational modeling with OCHEM and experimental validation

bioRxiv Pub Date : 2024-07-16 DOI:10.1101/2024.07.12.603170

Zunsheng Han, Zhonghua Xia, Jie Xia, Igor V Tetko, Song Wu

{"title":"The state-of-the-art machine learning model for Plasma Protein Binding Prediction: computational modeling with OCHEM and experimental validation","authors":"Zunsheng Han, Zhonghua Xia, Jie Xia, Igor V Tetko, Song Wu","doi":"10.1101/2024.07.12.603170","DOIUrl":null,"url":null,"abstract":"Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Prediction of PPB is an alternative to experimental approaches that are known to be time-consuming and costly. Although there are various models and web servers for PPB prediction already available, they suffer from low prediction accuracy and poor interpretability, in particular for molecules with high values, and are most often not properly validated in prospective studies. Here, we carried out strict data curation, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model was further validated in a prospective study to predict 63 poly-fluorinated and another 25 highly diverse compounds, and its performance for both these sets was superior to that of other previously reported models. To identify structural features related to PPB, we analyzed a model based on Morgan2 fingerprints and identified that features such as aromatic rings, halogen atoms, heterocyclic rings can discriminate high- and low-PPB molecules. In conclusion, we have established a PPB prediction model that showed state-of-the-art performance in prospective screening, which we have made publicly available in the OCHEM platform (https://ochem.eu/article/29). Graphic Abstract","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":"21 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.12.603170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Prediction of PPB is an alternative to experimental approaches that are known to be time-consuming and costly. Although there are various models and web servers for PPB prediction already available, they suffer from low prediction accuracy and poor interpretability, in particular for molecules with high values, and are most often not properly validated in prospective studies. Here, we carried out strict data curation, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model was further validated in a prospective study to predict 63 poly-fluorinated and another 25 highly diverse compounds, and its performance for both these sets was superior to that of other previously reported models. To identify structural features related to PPB, we analyzed a model based on Morgan2 fingerprints and identified that features such as aromatic rings, halogen atoms, heterocyclic rings can discriminate high- and low-PPB molecules. In conclusion, we have established a PPB prediction model that showed state-of-the-art performance in prospective screening, which we have made publicly available in the OCHEM platform (https://ochem.eu/article/29). Graphic Abstract

查看原文本刊更多论文

最先进的血浆蛋白结合预测机器学习模型：利用 OCHEM 进行计算建模和实验验证

血浆蛋白结合力（PPB）与药代动力学、药效学和药物毒性密切相关。众所周知，实验方法耗时且成本高昂，而预测 PPB 则可替代实验方法。虽然目前已有各种用于预测 PPB 的模型和网络服务器，但它们都存在预测准确率低和可解释性差的问题，尤其是对于高数值的分子，而且通常没有在前瞻性研究中得到适当验证。在这里，我们对数据进行了严格的整理，并应用共识建模法获得了一个模型，该模型在训练集和测试集上的决定系数分别为 0.90 和 0.91。该模型在一项前瞻性研究中得到了进一步验证，预测了63种多氟化合物和另外25种高度多样化的化合物，其在这两组化合物中的表现均优于之前报道的其他模型。为了确定与 PPB 有关的结构特征，我们分析了基于 Morgan2 指纹的模型，发现芳香环、卤素原子、杂环等特征可以区分高 PPB 分子和低 PPB 分子。总之，我们建立了一个 PPB 预测模型，该模型在前瞻性筛选中表现出了最先进的性能，我们已将其公开发布在 OCHEM 平台上 (https://ochem.eu/article/29)。图表摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

bioRxiv

自引率

0.00%

发文量