Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes

IF 3.8 2区 化学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Nesma Mousa , Hristo P. Varbanov , Vidya Kaipanchery , Elisabetta Gabano , Mauro Ravera , Andrey A. Toropov , Larisa Charochkina , Filipe Menezes , Guillaume Godin , Igor V. Tetko
{"title":"Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes","authors":"Nesma Mousa ,&nbsp;Hristo P. Varbanov ,&nbsp;Vidya Kaipanchery ,&nbsp;Elisabetta Gabano ,&nbsp;Mauro Ravera ,&nbsp;Andrey A. Toropov ,&nbsp;Larisa Charochkina ,&nbsp;Filipe Menezes ,&nbsp;Guillaume Godin ,&nbsp;Igor V. Tetko","doi":"10.1016/j.jinorgbio.2025.112890","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at <span><span>https://ochem.eu/article/31</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":364,"journal":{"name":"Journal of Inorganic Biochemistry","volume":"269 ","pages":"Article 112890"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Inorganic Biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0162013425000704","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at https://ochem.eu/article/31.

Abstract Image

在线OCHEM多任务模型用于铂配合物的溶解度和亲脂性预测
预测铂(II, IV)配合物的溶解度和亲脂性对于在药物发现中优先考虑潜在的抗癌候选物至关重要。本研究引入了第一个公开可用的在线模型,用于预测铂配合物的溶解度,解决了这方面缺乏文献和模型的问题。使用时间分裂数据集,我们通过对284个历史化合物(2017年之前报告的溶解度数据)的训练集进行5次交叉验证,建立了均方根误差(RMSE)为0.62的共识模型。然而,当应用于2017年以后报告的108种化合物的前瞻性测试集时,RMSE增加到0.86。对高预测误差的进一步分析表明,这些不准确性主要归因于新的化学支架,特别是Pt(IV)衍生物在训练集中的代表性不足。例如,一系列八种含菲罗啉的化合物,不包括在训练集的化学空间中,其RMSE为1.3。当使用组合数据集重新开发模型时,在相同的验证方案下,该系列的RMSE显著降低到0.34。此外,我们开发了一个可解释的线性模型,以确定影响铂配合物溶解度的结构特征和官能团。我们进一步验证了溶解度和亲脂性之间的相关性,与Yalkowsky通用溶解度方程一致。基于这些见解,我们开发了一个最终的多任务模型,同时预测溶解度和亲脂性作为两个终点,RMSE分别为0.62和0.44。数据和最终开发的模型可在https://ochem.eu/article/31上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Inorganic Biochemistry
Journal of Inorganic Biochemistry 生物-生化与分子生物学
CiteScore
7.00
自引率
10.30%
发文量
336
审稿时长
41 days
期刊介绍: The Journal of Inorganic Biochemistry is an established international forum for research in all aspects of Biological Inorganic Chemistry. Original papers of a high scientific level are published in the form of Articles (full length papers), Short Communications, Focused Reviews and Bioinorganic Methods. Topics include: the chemistry, structure and function of metalloenzymes; the interaction of inorganic ions and molecules with proteins and nucleic acids; the synthesis and properties of coordination complexes of biological interest including both structural and functional model systems; the function of metal- containing systems in the regulation of gene expression; the role of metals in medicine; the application of spectroscopic methods to determine the structure of metallobiomolecules; the preparation and characterization of metal-based biomaterials; and related systems. The emphasis of the Journal is on the structure and mechanism of action of metallobiomolecules.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信