Formal concept views for explainable boosting: A lattice-theoretic framework for Extreme Gradient Boosting and Gradient Boosting Models

IF 4.3

Intelligent Systems with Applications Pub Date : 2025-08-26 DOI:10.1016/j.iswa.2025.200569

Sherif Eneye Shuaib , Pakwan Riyapan , Jirapond Muangprathub

{"title":"Formal concept views for explainable boosting: A lattice-theoretic framework for Extreme Gradient Boosting and Gradient Boosting Models","authors":"Sherif Eneye Shuaib , Pakwan Riyapan , Jirapond Muangprathub","doi":"10.1016/j.iswa.2025.200569","DOIUrl":null,"url":null,"abstract":"<div><div>Tree-based ensemble methods, such as Extreme Gradient Boosting (XGBoost) and Gradient Boosting models (GBM), are widely used for supervised learning due to their strong predictive capabilities. However, their complex architectures often hinder interpretability. This paper extends a lattice-theoretic framework originally developed for Random Forests to boosting algorithms, enabling a structured analysis of their internal logic via formal concept analysis (FCA).</div><div>We formally adapt four conceptual views: leaf, tree, tree predicate, and interordinal predicate to account for the sequential learning and optimization processes unique to boosting. Using the binary-class version of the car evaluation dataset from the OpenML CC18 benchmark suite, we conduct a systematic parameter study to examine how hyperparameters, such as tree depth and the number of trees, affect both model performance and conceptual complexity. Random Forest results from prior literature are used as a comparative baseline.</div><div>The results show that XGBoost yields the highest test accuracy, while GBM demonstrates greater stability in generalization error. Conceptually, boosting methods generate more compact and interpretable leaf views but preserve rich structural information in higher-level views. In contrast, Random Forests tend to produce denser and more redundant concept lattices. These trade-offs highlight how boosting methods, when interpreted through FCA, can strike a balance between performance and transparency.</div><div>Overall, this work contributes to explainable AI by demonstrating how lattice-based conceptual views can be systematically extended to complex boosting models, offering interpretable insights without sacrificing predictive power.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"27 ","pages":"Article 200569"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266730532500095X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Tree-based ensemble methods, such as Extreme Gradient Boosting (XGBoost) and Gradient Boosting models (GBM), are widely used for supervised learning due to their strong predictive capabilities. However, their complex architectures often hinder interpretability. This paper extends a lattice-theoretic framework originally developed for Random Forests to boosting algorithms, enabling a structured analysis of their internal logic via formal concept analysis (FCA).

We formally adapt four conceptual views: leaf, tree, tree predicate, and interordinal predicate to account for the sequential learning and optimization processes unique to boosting. Using the binary-class version of the car evaluation dataset from the OpenML CC18 benchmark suite, we conduct a systematic parameter study to examine how hyperparameters, such as tree depth and the number of trees, affect both model performance and conceptual complexity. Random Forest results from prior literature are used as a comparative baseline.

The results show that XGBoost yields the highest test accuracy, while GBM demonstrates greater stability in generalization error. Conceptually, boosting methods generate more compact and interpretable leaf views but preserve rich structural information in higher-level views. In contrast, Random Forests tend to produce denser and more redundant concept lattices. These trade-offs highlight how boosting methods, when interpreted through FCA, can strike a balance between performance and transparency.

Overall, this work contributes to explainable AI by demonstrating how lattice-based conceptual views can be systematically extended to complex boosting models, offering interpretable insights without sacrificing predictive power.

查看原文本刊更多论文

可解释提升的形式概念观点：极端梯度提升和梯度提升模型的格理论框架

基于树的集成方法，如极端梯度增强（XGBoost）和梯度增强模型（GBM），由于其强大的预测能力而被广泛用于监督学习。然而，它们复杂的体系结构常常妨碍可解释性。本文将最初为随机森林开发的格理论框架扩展到增强算法，通过形式概念分析（FCA）对其内部逻辑进行结构化分析。我们正式采用了四种概念视图：叶子，树，树谓词和间隔谓词来解释boost特有的顺序学习和优化过程。使用来自OpenML CC18基准套件的汽车评估数据集的二进制版本，我们进行了系统的参数研究，以检查树深度和树数量等超参数如何影响模型性能和概念复杂性。随机森林结果从先前的文献被用作比较基线。结果表明，XGBoost的测试精度最高，而GBM在泛化误差方面表现出更高的稳定性。从概念上讲，增强方法生成更紧凑和可解释的叶视图，但在更高级的视图中保留丰富的结构信息。相比之下，随机森林倾向于产生更密集和更冗余的概念格。这些权衡突出了通过FCA解释的激励方法如何在绩效和透明度之间取得平衡。总的来说，这项工作通过展示如何将基于格子的概念视图系统地扩展到复杂的促进模型，从而在不牺牲预测能力的情况下提供可解释的见解，从而有助于可解释的人工智能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Systems with Applications

CiteScore

5.60

自引率

0.00%

发文量