{"title":"Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches","authors":"T. Cheng, Chih-Ping Wei, V. Tseng","doi":"10.1109/CBMS.2006.87","DOIUrl":null,"url":null,"abstract":"Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)","PeriodicalId":208693,"journal":{"name":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2006.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 109
Abstract
Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)