Formal concept views for explainable boosting: A lattice-theoretic framework for Extreme Gradient Boosting and Gradient Boosting Models

IF 4.3
Sherif Eneye Shuaib , Pakwan Riyapan , Jirapond Muangprathub
{"title":"Formal concept views for explainable boosting: A lattice-theoretic framework for Extreme Gradient Boosting and Gradient Boosting Models","authors":"Sherif Eneye Shuaib ,&nbsp;Pakwan Riyapan ,&nbsp;Jirapond Muangprathub","doi":"10.1016/j.iswa.2025.200569","DOIUrl":null,"url":null,"abstract":"<div><div>Tree-based ensemble methods, such as Extreme Gradient Boosting (XGBoost) and Gradient Boosting models (GBM), are widely used for supervised learning due to their strong predictive capabilities. However, their complex architectures often hinder interpretability. This paper extends a lattice-theoretic framework originally developed for Random Forests to boosting algorithms, enabling a structured analysis of their internal logic via formal concept analysis (FCA).</div><div>We formally adapt four conceptual views: leaf, tree, tree predicate, and interordinal predicate to account for the sequential learning and optimization processes unique to boosting. Using the binary-class version of the car evaluation dataset from the OpenML CC18 benchmark suite, we conduct a systematic parameter study to examine how hyperparameters, such as tree depth and the number of trees, affect both model performance and conceptual complexity. Random Forest results from prior literature are used as a comparative baseline.</div><div>The results show that XGBoost yields the highest test accuracy, while GBM demonstrates greater stability in generalization error. Conceptually, boosting methods generate more compact and interpretable leaf views but preserve rich structural information in higher-level views. In contrast, Random Forests tend to produce denser and more redundant concept lattices. These trade-offs highlight how boosting methods, when interpreted through FCA, can strike a balance between performance and transparency.</div><div>Overall, this work contributes to explainable AI by demonstrating how lattice-based conceptual views can be systematically extended to complex boosting models, offering interpretable insights without sacrificing predictive power.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"27 ","pages":"Article 200569"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266730532500095X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Tree-based ensemble methods, such as Extreme Gradient Boosting (XGBoost) and Gradient Boosting models (GBM), are widely used for supervised learning due to their strong predictive capabilities. However, their complex architectures often hinder interpretability. This paper extends a lattice-theoretic framework originally developed for Random Forests to boosting algorithms, enabling a structured analysis of their internal logic via formal concept analysis (FCA).
We formally adapt four conceptual views: leaf, tree, tree predicate, and interordinal predicate to account for the sequential learning and optimization processes unique to boosting. Using the binary-class version of the car evaluation dataset from the OpenML CC18 benchmark suite, we conduct a systematic parameter study to examine how hyperparameters, such as tree depth and the number of trees, affect both model performance and conceptual complexity. Random Forest results from prior literature are used as a comparative baseline.
The results show that XGBoost yields the highest test accuracy, while GBM demonstrates greater stability in generalization error. Conceptually, boosting methods generate more compact and interpretable leaf views but preserve rich structural information in higher-level views. In contrast, Random Forests tend to produce denser and more redundant concept lattices. These trade-offs highlight how boosting methods, when interpreted through FCA, can strike a balance between performance and transparency.
Overall, this work contributes to explainable AI by demonstrating how lattice-based conceptual views can be systematically extended to complex boosting models, offering interpretable insights without sacrificing predictive power.
可解释提升的形式概念观点:极端梯度提升和梯度提升模型的格理论框架
基于树的集成方法,如极端梯度增强(XGBoost)和梯度增强模型(GBM),由于其强大的预测能力而被广泛用于监督学习。然而,它们复杂的体系结构常常妨碍可解释性。本文将最初为随机森林开发的格理论框架扩展到增强算法,通过形式概念分析(FCA)对其内部逻辑进行结构化分析。我们正式采用了四种概念视图:叶子,树,树谓词和间隔谓词来解释boost特有的顺序学习和优化过程。使用来自OpenML CC18基准套件的汽车评估数据集的二进制版本,我们进行了系统的参数研究,以检查树深度和树数量等超参数如何影响模型性能和概念复杂性。随机森林结果从先前的文献被用作比较基线。结果表明,XGBoost的测试精度最高,而GBM在泛化误差方面表现出更高的稳定性。从概念上讲,增强方法生成更紧凑和可解释的叶视图,但在更高级的视图中保留丰富的结构信息。相比之下,随机森林倾向于产生更密集和更冗余的概念格。这些权衡突出了通过FCA解释的激励方法如何在绩效和透明度之间取得平衡。总的来说,这项工作通过展示如何将基于格子的概念视图系统地扩展到复杂的促进模型,从而在不牺牲预测能力的情况下提供可解释的见解,从而有助于可解释的人工智能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信