A Cost-Sensitive Feature Selection Method for High-Dimensional Data

Chaojie An, Qifeng Zhou
{"title":"A Cost-Sensitive Feature Selection Method for High-Dimensional Data","authors":"Chaojie An, Qifeng Zhou","doi":"10.1109/ICCSE.2019.8845414","DOIUrl":null,"url":null,"abstract":"With the increase of data dimension in many application fields, feature selection, as an essential step to avoid the curse of dimensionality and enhanced the generalization of the model, is attracting more and more research attention. However, most existing feature selection methods always assume the features have the same cost. These research efforts mainly focus on features’ relevance to learning performance while neglecting the cost to obtain them. Feature cost is a crucial factor need to be considered in feature selection problem especially for the real world applications. For example, in the process of medical diagnosis, each feature may have a very different testing cost. To select low-cost subsets of informative features, in this paper, we propose a stratified random forest-based cost-sensitive feature selection method. Unlike commonly used two-step cost-sensitive feature selection approaches, in our model, the cost of features is incorporated into the construction process of the base decision tree, that is, the cost and the performance of each feature are optimized simultaneously. Moreover, we adopt a stratified sample method to enhance the performance of the feature subset for high-dimensional data. A series of experimental results show that compared with the state-of-the-art methods, the proposed approach can lower the cost of the selected feature subset while maintaining comparable learning performance.","PeriodicalId":351346,"journal":{"name":"2019 14th International Conference on Computer Science & Education (ICCSE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 14th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2019.8845414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

With the increase of data dimension in many application fields, feature selection, as an essential step to avoid the curse of dimensionality and enhanced the generalization of the model, is attracting more and more research attention. However, most existing feature selection methods always assume the features have the same cost. These research efforts mainly focus on features’ relevance to learning performance while neglecting the cost to obtain them. Feature cost is a crucial factor need to be considered in feature selection problem especially for the real world applications. For example, in the process of medical diagnosis, each feature may have a very different testing cost. To select low-cost subsets of informative features, in this paper, we propose a stratified random forest-based cost-sensitive feature selection method. Unlike commonly used two-step cost-sensitive feature selection approaches, in our model, the cost of features is incorporated into the construction process of the base decision tree, that is, the cost and the performance of each feature are optimized simultaneously. Moreover, we adopt a stratified sample method to enhance the performance of the feature subset for high-dimensional data. A series of experimental results show that compared with the state-of-the-art methods, the proposed approach can lower the cost of the selected feature subset while maintaining comparable learning performance.
一种代价敏感的高维数据特征选择方法
随着许多应用领域中数据维数的增加,特征选择作为避免维数诅咒和增强模型泛化的重要步骤,越来越受到研究人员的关注。然而,大多数现有的特征选择方法总是假设特征具有相同的代价。这些研究主要关注特征与学习性能的相关性,而忽略了获取特征的成本。特征成本是特征选择问题中需要考虑的一个重要因素,特别是在实际应用中。例如,在医学诊断过程中,每个特征可能有非常不同的测试成本。为了选择信息特征的低成本子集,本文提出了一种基于分层随机森林的成本敏感特征选择方法。与常用的两步成本敏感特征选择方法不同,在我们的模型中,特征的成本被纳入到基本决策树的构建过程中,即同时优化每个特征的成本和性能。此外,我们采用分层样本方法来提高高维数据特征子集的性能。一系列的实验结果表明,与现有的方法相比,该方法可以降低所选特征子集的成本,同时保持相当的学习性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信