Interpret Gaussian Process Models by Using Integrated Gradients.

IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL
Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-11-26 DOI:10.1002/minf.202400051
Fan Zhang, Naoaki Ono, Shigehiko Kanaya
{"title":"Interpret Gaussian Process Models by Using Integrated Gradients.","authors":"Fan Zhang, Naoaki Ono, Shigehiko Kanaya","doi":"10.1002/minf.202400051","DOIUrl":null,"url":null,"abstract":"<p><p>Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400051"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695984/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202400051","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.

利用综合梯度解释高斯过程模型
高斯过程回归(GPR)是一种非参数概率模型,不仅能计算预测的平均值,还能计算预测的标准偏差,标准偏差代表预测的置信度。它具有极大的灵活性,可以通过设计核函数使其非线性化,通过改变似然函数使其对异常值具有鲁棒性,并扩展到分类模型。最近,有人提出了将深度学习与 GPR 相结合的模型,如深度核学习 GPR,据报道,其准确率高于 GPR。然而,由于其非参数性质,GPR 的解释具有挑战性。虽然像 LIME 或核 SHAP 这样的可解释人工智能(XAI)方法可以解释预测的平均值,但解释预测的标准偏差仍然很困难。在本研究中,我们提出了一种通过评估解释变量的重要性来解释 GPR 预测的新方法。我们将 GPR 模型与综合梯度(IG)方法相结合,以评估每个特征对预测的贡献。通过评估后验分布的标准偏差,我们发现综合梯度法对预测的不确定性进行了详细分解,将其归因于各个特征贡献的不确定性。这种方法不仅能突出对预测影响最大的变量,还能通过量化与每个特征相关的不确定性,深入了解模型的可靠性。通过这种方法,我们可以更深入地了解模型的行为,并增强对其预测结果的信任,尤其是在可解释性与准确性同等重要的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Informatics
Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.30
自引率
2.80%
发文量
70
审稿时长
3 months
期刊介绍: Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信