A unified approach to extract interpretable rules from tree ensembles via Integer Programming

IF 4.3 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Operations Research Pub Date : 2025-09-19 DOI:10.1016/j.cor.2025.107283

Lorenzo Bonasera , Emilio Carrizosa

{"title":"A unified approach to extract interpretable rules from tree ensembles via Integer Programming","authors":"Lorenzo Bonasera , Emilio Carrizosa","doi":"10.1016/j.cor.2025.107283","DOIUrl":null,"url":null,"abstract":"<div><div>Tree ensembles are widely used machine learning models, known for their effectiveness in supervised classification and regression tasks. Their performance derives from aggregating predictions of multiple decision trees, which are renowned for their interpretability properties. However, tree ensemble models do not reliably exhibit interpretable output. Our work aims to extract an optimized list of rules from a trained tree ensemble, providing the user with a condensed, interpretable model that retains most of the predictive power of the full model. Our approach consists of solving a set partitioning problem formulated through Integer Programming. The extracted list of rules is unweighted and defines a partition of the training data, assigning each instance to exactly one rule, and thereby simplifying the explanation process. The proposed method works with tabular or time series data, for both classification and regression tasks, and its flexible formulation can include any arbitrary loss or regularization functions. Our computational experiments offer statistically significant evidence that our method performs comparably to several rule extraction methods in terms of predictive performance and fidelity towards the tree ensemble. Moreover, we empirically show that the proposed method effectively extracts interpretable rules from tree ensembles that are designed for time series data.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"185 ","pages":"Article 107283"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825003120","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Tree ensembles are widely used machine learning models, known for their effectiveness in supervised classification and regression tasks. Their performance derives from aggregating predictions of multiple decision trees, which are renowned for their interpretability properties. However, tree ensemble models do not reliably exhibit interpretable output. Our work aims to extract an optimized list of rules from a trained tree ensemble, providing the user with a condensed, interpretable model that retains most of the predictive power of the full model. Our approach consists of solving a set partitioning problem formulated through Integer Programming. The extracted list of rules is unweighted and defines a partition of the training data, assigning each instance to exactly one rule, and thereby simplifying the explanation process. The proposed method works with tabular or time series data, for both classification and regression tasks, and its flexible formulation can include any arbitrary loss or regularization functions. Our computational experiments offer statistically significant evidence that our method performs comparably to several rule extraction methods in terms of predictive performance and fidelity towards the tree ensemble. Moreover, we empirically show that the proposed method effectively extracts interpretable rules from tree ensembles that are designed for time series data.

查看原文本刊更多论文

一种通过整数规划从树集成中提取可解释规则的统一方法

树集成是广泛使用的机器学习模型，以其在监督分类和回归任务中的有效性而闻名。它们的性能来源于多个决策树的聚合预测，这些决策树以其可解释性而闻名。然而，树集成模型不能可靠地显示可解释的输出。我们的工作旨在从训练树集成中提取优化的规则列表，为用户提供一个浓缩的、可解释的模型，该模型保留了完整模型的大部分预测能力。我们的方法包括解决一个通过整数规划制定的集划分问题。提取的规则列表是不加权的，并定义了训练数据的一个分区，将每个实例精确地分配给一个规则，从而简化了解释过程。所提出的方法适用于表格或时间序列数据，用于分类和回归任务，其灵活的公式可以包括任何任意损失或正则化函数。我们的计算实验提供了统计上显著的证据，表明我们的方法在预测性能和对树集合的保真度方面与几种规则提取方法相当。此外，我们的经验表明，该方法可以有效地从为时间序列数据设计的树集成中提取可解释的规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.