Investigating the Evolution of Tree Boosting Models with Visual Analytics

2021 IEEE 14th Pacific Visualization Symposium (PacificVis) Pub Date : 2021-04-01 DOI:10.1109/PacificVis52677.2021.00032

Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang

{"title":"Investigating the Evolution of Tree Boosting Models with Visual Analytics","authors":"Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang","doi":"10.1109/PacificVis52677.2021.00032","DOIUrl":null,"url":null,"abstract":"Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor, to push the model’s decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix, providing users with an effective interface to track data instances’ predictions across the tree boosting process. Also, we present an improved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system’s effectiveness.","PeriodicalId":199565,"journal":{"name":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PacificVis52677.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor, to push the model’s decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix, providing users with an effective interface to track data instances’ predictions across the tree boosting process. Also, we present an improved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system’s effectiveness.

查看原文本刊更多论文

用可视化分析研究树提升模型的演变

树增强模型是一种被广泛采用的预测模型，并且表现出比其他传统模型甚至深度学习模型更优越的性能，特别是自从最近它们的并行和分布式实现发布以来，例如XGBoost、LightGMB和CatBoost。树增强使用一组顺序生成的弱学习器(即决策树)，每个学习器从其前任的错误中学习，将模型的决策边界推向真实边界。随着树的数量在训练过程中不断增加，揭示新添加的树如何改变单个数据实例的预测，以及不同数据特征的影响如何演变是很重要的。为了实现这些目标，在本文中，我们引入了一种新的时间混淆矩阵设计，为用户提供了一个有效的界面来跟踪数据实例在整个树提升过程中的预测。此外，我们还提出了一种改进的可视化方法，以便更好地说明和比较各个数据特征(基于它们的SHAP值)在训练迭代中的影响。将这些组件与树形结构可视化组件相结合，提出了树形提升模型的可视化分析系统。通过与领域专家使用真实世界数据集的案例研究，我们验证了系统的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 14th Pacific Visualization Symposium (PacificVis)

自引率

0.00%

发文量