Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches

19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) Pub Date : 2006-06-22 DOI:10.1109/CBMS.2006.87

T. Cheng, Chih-Ping Wei, V. Tseng

{"title":"Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches","authors":"T. Cheng, Chih-Ping Wei, V. Tseng","doi":"10.1109/CBMS.2006.87","DOIUrl":null,"url":null,"abstract":"Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)","PeriodicalId":208693,"journal":{"name":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2006.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 109

Abstract

Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)

查看原文本刊更多论文

医学数据挖掘的特征选择:专家判断与自动方法的比较

数据挖掘是指从大型数据库中自动提取以前未知的、有效的和可操作的模式或知识以支持关键决策的过程。在不同的数据挖掘技术中，分类分析被广泛应用于医疗保健应用，以支持医疗诊断决策，提高患者护理质量等。如果训练数据集包含不相关的特征(即属性)，分类分析可能会产生不太准确和难以理解的结果。两种常用的特征选择方法包括使用自动特征选择机制(即数据驱动)或专家判断(即知识驱动)。由于其基础过程的差异，两种流行的特征选择方法可能有其独特的偏差，从而可能导致不同的分类效果。在本研究中，我们对两种特征选择方法对心血管疾病风险预测数据集的分类效果进行了实证评估。我们的评估结果表明，领域专家选择的特征子集提高了分类器的灵敏度，而自动特征选择机制选择的特征子集提高了分类器对大多数类的预测能力(即本研究中的特异性)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)

自引率

0.00%

发文量