Optimized hybrid imbalanced data sampling for decision tree training

Weronika Węgier, Michał Koziarski, Michal Wozniak
{"title":"Optimized hybrid imbalanced data sampling for decision tree training","authors":"Weronika Węgier, Michał Koziarski, Michal Wozniak","doi":"10.1145/3583133.3590702","DOIUrl":null,"url":null,"abstract":"For many real-world decision-making tasks, a key feature is decision explainability. Hence, the so-called glass-box models offer full explainability and are still prevalent. An important area of application is the classification of imbalanced data. We require that the proposed classifiers not make errors on the minority class while minimizing errors on the majority class. This paper proposes a method for preprocessing imbalanced data by generating minority class objects. We use a multi-criteria optimization method (NSGA-II) to avoid optimizing a single aggregate criterion. The method returns a group of non-dominated solutions from which the end user can choose the best solution from his point of view. The automatic solution selection from a Pareto front is also proposed for comparison purposes. The proposed method returns good-quality classifiers, often surpassing the quality of baseline single-objective methods, and is additionally characterized by full interpretability.","PeriodicalId":422029,"journal":{"name":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583133.3590702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

For many real-world decision-making tasks, a key feature is decision explainability. Hence, the so-called glass-box models offer full explainability and are still prevalent. An important area of application is the classification of imbalanced data. We require that the proposed classifiers not make errors on the minority class while minimizing errors on the majority class. This paper proposes a method for preprocessing imbalanced data by generating minority class objects. We use a multi-criteria optimization method (NSGA-II) to avoid optimizing a single aggregate criterion. The method returns a group of non-dominated solutions from which the end user can choose the best solution from his point of view. The automatic solution selection from a Pareto front is also proposed for comparison purposes. The proposed method returns good-quality classifiers, often surpassing the quality of baseline single-objective methods, and is additionally characterized by full interpretability.
决策树训练的优化混合不平衡数据采样
对于许多现实世界的决策任务,一个关键特征是决策的可解释性。因此,所谓的玻璃盒模型提供了充分的可解释性,并且仍然很流行。一个重要的应用领域是不平衡数据的分类。我们要求所提出的分类器不会在少数类上犯错误,同时最小化多数类上的错误。提出了一种通过生成少数类对象对不平衡数据进行预处理的方法。我们使用多准则优化方法(NSGA-II)来避免对单个聚合准则进行优化。该方法返回一组非主导解决方案,最终用户可以从中选择他认为的最佳解决方案。为了便于比较,还提出了从帕累托前沿自动选择解的方法。所提出的方法返回高质量的分类器,通常超过基线单目标方法的质量,并且具有完全可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信