A methodology to correctly assess the applicability domain of cell membrane permeability predictors for cyclic peptides†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Gökçe Geylan, Leonardo De Maria, Ola Engkvist, Florian David and Ulf Norinder
{"title":"A methodology to correctly assess the applicability domain of cell membrane permeability predictors for cyclic peptides†","authors":"Gökçe Geylan, Leonardo De Maria, Ola Engkvist, Florian David and Ulf Norinder","doi":"10.1039/D4DD00056K","DOIUrl":null,"url":null,"abstract":"<p >Being able to predict the cell permeability of cyclic peptides is essential for unlocking their potential as a drug modality for intracellular targets. With a wide range of studies of cell permeability but a limited number of data points, the reliability of the machine learning (ML) models to predict previously unexplored chemical spaces becomes a challenge. In this work, we systemically investigate the predictive capability of ML models from the perspective of their extrapolation to never-before-seen applicability domains, with a particular focus on the permeability task. Four predictive algorithms, namely Support-Vector Machine, Random Forest, LightGBM and XGBoost, jointly with a conformal prediction framework were employed to characterize and evaluate the applicability through uncertainty quantification. Efficiency and validity of the models' predictions with multiple calibration strategies were assessed with respect to several external datasets from different parts of the chemical space through a set of experiments. The experiments showed that the predictors generalizing well to the applicability domain defined by the training data, can fail to achieve similar model performance on other parts of the chemical spaces. Our study proposes an approach to overcome such limitations by the means of improving the efficiency of models without sacrificing the validity. The trade-off between the reliability and informativeness was balanced when the models were calibrated with a subset of the data from the new targeted domain. This study outlines an approach to enable the extrapolation of predictive power and restore the models' reliability <em>via</em> a recalibration strategy without the need for retraining the underlying model.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 1761-1775"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00056k?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00056k","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Being able to predict the cell permeability of cyclic peptides is essential for unlocking their potential as a drug modality for intracellular targets. With a wide range of studies of cell permeability but a limited number of data points, the reliability of the machine learning (ML) models to predict previously unexplored chemical spaces becomes a challenge. In this work, we systemically investigate the predictive capability of ML models from the perspective of their extrapolation to never-before-seen applicability domains, with a particular focus on the permeability task. Four predictive algorithms, namely Support-Vector Machine, Random Forest, LightGBM and XGBoost, jointly with a conformal prediction framework were employed to characterize and evaluate the applicability through uncertainty quantification. Efficiency and validity of the models' predictions with multiple calibration strategies were assessed with respect to several external datasets from different parts of the chemical space through a set of experiments. The experiments showed that the predictors generalizing well to the applicability domain defined by the training data, can fail to achieve similar model performance on other parts of the chemical spaces. Our study proposes an approach to overcome such limitations by the means of improving the efficiency of models without sacrificing the validity. The trade-off between the reliability and informativeness was balanced when the models were calibrated with a subset of the data from the new targeted domain. This study outlines an approach to enable the extrapolation of predictive power and restore the models' reliability via a recalibration strategy without the need for retraining the underlying model.

Abstract Image

Abstract Image

正确评估细胞膜渗透性预测环肽适用范围的方法
要挖掘环肽作为细胞内靶点药物模式的潜力,预测环肽的细胞渗透性至关重要。由于对细胞渗透性的研究范围广泛,但数据点数量有限,因此机器学习(ML)模型预测以前未探索过的化学空间的可靠性就成了一个挑战。在这项工作中,我们从外推法的角度系统地研究了 ML 模型对前所未见的应用领域的预测能力,并特别关注渗透性任务。我们采用了四种预测算法,即支持向量机、随机森林、LightGBM 和 XGBoost,并结合保形预测框架,通过不确定性量化来描述和评估其适用性。通过一系列实验,针对来自化学空间不同部分的多个外部数据集,评估了采用多种校准策略的模型预测的效率和有效性。实验结果表明,对训练数据所定义的适用性领域具有良好普适性的预测器,在化学空间的其他部分可能无法实现类似的模型性能。我们的研究提出了一种在不牺牲有效性的前提下提高模型效率的方法来克服这种局限性。当使用新目标领域的数据子集校准模型时,可靠性和信息量之间的权衡得到了平衡。本研究概述了一种通过重新校准策略实现预测能力外推并恢复模型可靠性的方法,而无需重新训练基础模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信