C. Son, A. Shin, Young-Dong Lee, Hee-Joon Park, Hyoung-Seob Park, Y. Kim
{"title":"Variable Threshold based Feature Selection using Spatial Distribution of Data","authors":"C. Son, A. Shin, Young-Dong Lee, Hee-Joon Park, Hyoung-Seob Park, Y. Kim","doi":"10.4258/JKSMI.2009.15.4.475","DOIUrl":null,"url":null,"abstract":"Objective: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. Methods: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. Results: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. Conclusion: The proposed method can work well for clinical data mining and pattern classification applications. (Journal of Korean Society of Medical Informatics 15-4, 475-481, 2009)","PeriodicalId":255087,"journal":{"name":"Journal of Korean Society of Medical Informatics","volume":"30 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korean Society of Medical Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/JKSMI.2009.15.4.475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Objective: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. Methods: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. Results: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. Conclusion: The proposed method can work well for clinical data mining and pattern classification applications. (Journal of Korean Society of Medical Informatics 15-4, 475-481, 2009)