Variable Threshold based Feature Selection using Spatial Distribution of Data

C. Son, A. Shin, Young-Dong Lee, Hee-Joon Park, Hyoung-Seob Park, Y. Kim
{"title":"Variable Threshold based Feature Selection using Spatial Distribution of Data","authors":"C. Son, A. Shin, Young-Dong Lee, Hee-Joon Park, Hyoung-Seob Park, Y. Kim","doi":"10.4258/JKSMI.2009.15.4.475","DOIUrl":null,"url":null,"abstract":"Objective: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. Methods: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. Results: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. Conclusion: The proposed method can work well for clinical data mining and pattern classification applications. (Journal of Korean Society of Medical Informatics 15-4, 475-481, 2009)","PeriodicalId":255087,"journal":{"name":"Journal of Korean Society of Medical Informatics","volume":"30 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korean Society of Medical Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/JKSMI.2009.15.4.475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Objective: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. Methods: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. Results: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. Conclusion: The proposed method can work well for clinical data mining and pattern classification applications. (Journal of Korean Society of Medical Informatics 15-4, 475-481, 2009)
基于数据空间分布的变阈值特征选择
目的:在高维临床数据处理中,选择最优的特征子集不仅可以降低计算复杂度,而且可以提高基于给定数据构建的模型的价值。本文提出了一种有效的变阈值特征选择方法。方法:利用重叠区域中具有非冗余属性值的标记数据的空间分布来评价类内分离程度,并利用冗余属性值的加权平均值选择各特征的截止值。结果:通过将临床专家从55个特征中选择11个特征的呼吸困难患者数据集的实验结果与使用其他7种分类方法获得的数据集进行比较,证明了所提出方法的有效性。结论:该方法可以很好地应用于临床数据挖掘和模式分类。(韩国医学信息学会杂志15- 4,475 -481,2009)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信