Robust instance-dependent cost-sensitive classification

IF 1.4 4区 计算机科学 Q2 STATISTICS & PROBABILITY
Simon De Vos, Toon Vanderschueren, Tim Verdonck, Wouter Verbeke
{"title":"Robust instance-dependent cost-sensitive classification","authors":"Simon De Vos,&nbsp;Toon Vanderschueren,&nbsp;Tim Verdonck,&nbsp;Wouter Verbeke","doi":"10.1007/s11634-022-00533-3","DOIUrl":null,"url":null,"abstract":"<div><p>Instance-dependent cost-sensitive (IDCS) learning methods have proven useful for binary classification tasks where individual instances are associated with variable misclassification costs. However, we demonstrate in this paper by means of a series of experiments that IDCS methods are sensitive to noise and outliers in relation to instance-dependent misclassification costs and their performance strongly depends on the cost distribution of the data sample. Therefore, we propose a generic three-step framework to make IDCS methods more robust: (i) detect outliers automatically, (ii) correct outlying cost information in a data-driven way, and (iii) construct an IDCS learning method using the adjusted cost information. We apply this framework to cslogit, a logistic regression-based IDCS method, to obtain its robust version, which we name r-cslogit. The robustness of this approach is introduced in steps (i) and (ii), where we make use of robust estimators to detect and impute outlying costs of individual instances. The newly proposed r-cslogit method is tested on synthetic and semi-synthetic data and proven to be superior in terms of savings compared to its non-robust counterpart for variable levels of noise and outliers. All our code is made available online at https://github.com/SimonDeVos/Robust-IDCS.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"1057 - 1079"},"PeriodicalIF":1.4000,"publicationDate":"2023-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-022-00533-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Instance-dependent cost-sensitive (IDCS) learning methods have proven useful for binary classification tasks where individual instances are associated with variable misclassification costs. However, we demonstrate in this paper by means of a series of experiments that IDCS methods are sensitive to noise and outliers in relation to instance-dependent misclassification costs and their performance strongly depends on the cost distribution of the data sample. Therefore, we propose a generic three-step framework to make IDCS methods more robust: (i) detect outliers automatically, (ii) correct outlying cost information in a data-driven way, and (iii) construct an IDCS learning method using the adjusted cost information. We apply this framework to cslogit, a logistic regression-based IDCS method, to obtain its robust version, which we name r-cslogit. The robustness of this approach is introduced in steps (i) and (ii), where we make use of robust estimators to detect and impute outlying costs of individual instances. The newly proposed r-cslogit method is tested on synthetic and semi-synthetic data and proven to be superior in terms of savings compared to its non-robust counterpart for variable levels of noise and outliers. All our code is made available online at https://github.com/SimonDeVos/Robust-IDCS.

Abstract Image

健壮的依赖实例的成本敏感分类
实例相关成本敏感(IDCS)学习方法已被证明可用于二进制分类任务,其中单个实例与可变的错误分类成本相关。然而,我们在本文中通过一系列实验证明,IDCS方法对与实例相关的错误分类成本的噪声和异常值敏感,并且它们的性能在很大程度上取决于数据样本的成本分布。因此,我们提出了一个通用的三步框架,使IDCS方法更加稳健:(i)自动检测异常值,(ii)以数据驱动的方式校正异常成本信息,以及(iii)使用调整后的成本信息构建IDCS学习方法。我们将该框架应用于cslogit,一种基于逻辑回归的IDCS方法,以获得其稳健版本,我们将其命名为r-cslogit。在步骤(i)和(ii)中介绍了这种方法的稳健性,其中我们使用稳健估计量来检测和估算个别实例的异常成本。新提出的r-cslogit方法在合成和半合成数据上进行了测试,并被证明在可变噪声水平和异常值的情况下,与非鲁棒方法相比,在节省方面是优越的。我们的所有代码都可在线获取,网址为https://github.com/SimonDeVos/Robust-IDCS.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.40
自引率
6.20%
发文量
45
审稿时长
>12 weeks
期刊介绍: The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信