{"title":"用可视化分析研究树提升模型的演变","authors":"Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang","doi":"10.1109/PacificVis52677.2021.00032","DOIUrl":null,"url":null,"abstract":"Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor, to push the model’s decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix, providing users with an effective interface to track data instances’ predictions across the tree boosting process. Also, we present an improved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system’s effectiveness.","PeriodicalId":199565,"journal":{"name":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Investigating the Evolution of Tree Boosting Models with Visual Analytics\",\"authors\":\"Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang\",\"doi\":\"10.1109/PacificVis52677.2021.00032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor, to push the model’s decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix, providing users with an effective interface to track data instances’ predictions across the tree boosting process. Also, we present an improved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system’s effectiveness.\",\"PeriodicalId\":199565,\"journal\":{\"name\":\"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PacificVis52677.2021.00032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PacificVis52677.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigating the Evolution of Tree Boosting Models with Visual Analytics
Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor, to push the model’s decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix, providing users with an effective interface to track data instances’ predictions across the tree boosting process. Also, we present an improved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system’s effectiveness.