Structured Iterative Hard Thresholding for Categorical and Mixed Data Types

Thy Nguyen, Tayo Obafemi-Ajayi
{"title":"Structured Iterative Hard Thresholding for Categorical and Mixed Data Types","authors":"Thy Nguyen, Tayo Obafemi-Ajayi","doi":"10.1109/SSCI44817.2019.9002948","DOIUrl":null,"url":null,"abstract":"In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numericalal features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.","PeriodicalId":6729,"journal":{"name":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"19 1","pages":"2541-2547"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI44817.2019.9002948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numericalal features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.
分类和混合数据类型的结构化迭代硬阈值
在许多应用程序中,数据以混合数据类型格式存在,即名义(分类)和数字特征的组合。处理分类特征的常见做法是使用编码方法将离散值转换为数字表示。然而,数字表示往往忽略了分类特征的固有结构,潜在地降低了学习算法的性能。利用数字表示也可能限制对学习模型的解释,例如找到最具判别性的分类特征或过滤不相关的属性。在这项工作中,我们扩展了迭代硬阈值(IHT)算法来量化分类特征的结构。本文提出的结构化硬阈值算法基于真实数据集和合成数据集进行实证评估,并与原始硬阈值算法LASSO和Random Forest进行对比。结果表明,与原始IHT相比,该方法的性能有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信