Optimized hybrid imbalanced data sampling for decision tree training

Proceedings of the Companion Conference on Genetic and Evolutionary Computation Pub Date : 2023-07-15 DOI:10.1145/3583133.3590702

Weronika Węgier, Michał Koziarski, Michal Wozniak

引用次数: 0

Abstract

For many real-world decision-making tasks, a key feature is decision explainability. Hence, the so-called glass-box models offer full explainability and are still prevalent. An important area of application is the classification of imbalanced data. We require that the proposed classifiers not make errors on the minority class while minimizing errors on the majority class. This paper proposes a method for preprocessing imbalanced data by generating minority class objects. We use a multi-criteria optimization method (NSGA-II) to avoid optimizing a single aggregate criterion. The method returns a group of non-dominated solutions from which the end user can choose the best solution from his point of view. The automatic solution selection from a Pareto front is also proposed for comparison purposes. The proposed method returns good-quality classifiers, often surpassing the quality of baseline single-objective methods, and is additionally characterized by full interpretability.

查看原文本刊更多论文

决策树训练的优化混合不平衡数据采样

对于许多现实世界的决策任务，一个关键特征是决策的可解释性。因此，所谓的玻璃盒模型提供了充分的可解释性，并且仍然很流行。一个重要的应用领域是不平衡数据的分类。我们要求所提出的分类器不会在少数类上犯错误，同时最小化多数类上的错误。提出了一种通过生成少数类对象对不平衡数据进行预处理的方法。我们使用多准则优化方法(NSGA-II)来避免对单个聚合准则进行优化。该方法返回一组非主导解决方案，最终用户可以从中选择他认为的最佳解决方案。为了便于比较，还提出了从帕累托前沿自动选择解的方法。所提出的方法返回高质量的分类器，通常超过基线单目标方法的质量，并且具有完全可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Companion Conference on Genetic and Evolutionary Computation

自引率

0.00%

发文量