Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches

T. Cheng, Chih-Ping Wei, V. Tseng
{"title":"Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches","authors":"T. Cheng, Chih-Ping Wei, V. Tseng","doi":"10.1109/CBMS.2006.87","DOIUrl":null,"url":null,"abstract":"Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)","PeriodicalId":208693,"journal":{"name":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2006.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 109

Abstract

Data mining refers to the process of automatic extracting previously unknown, valid, and actionable patterns or knowledge from large databases for crucial decision support. Among different data mining technique, classification analysis is widely adopted for healthcare applications for supporting medical diagnostic decisions, improving quality of patient care, etc. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate and less understandable results. Two commonly employed feature selection approaches include use of automatic feature selection mechanisms (i.e., data-driven) or expert judgment (i.e., knowledge-driven). Due to differences in their underlying processes, the two prevailing feature selection approaches may have their unique biases that possibly lead to dissimilar classification effectiveness. In this study, we empirically evaluate the classification effectiveness resulted from the two feature selection approaches on a risk prediction of cardiovascular disease dataset. Our evaluation results suggest that the feature subsets selected domain experts improve the sensitivity of a classifier, while the feature subsets selected by an automatic feature selection mechanism improve the predictive power of a classifier on the majority class (i.e., the specificity in this study)
医学数据挖掘的特征选择:专家判断与自动方法的比较
数据挖掘是指从大型数据库中自动提取以前未知的、有效的和可操作的模式或知识以支持关键决策的过程。在不同的数据挖掘技术中,分类分析被广泛应用于医疗保健应用,以支持医疗诊断决策,提高患者护理质量等。如果训练数据集包含不相关的特征(即属性),分类分析可能会产生不太准确和难以理解的结果。两种常用的特征选择方法包括使用自动特征选择机制(即数据驱动)或专家判断(即知识驱动)。由于其基础过程的差异,两种流行的特征选择方法可能有其独特的偏差,从而可能导致不同的分类效果。在本研究中,我们对两种特征选择方法对心血管疾病风险预测数据集的分类效果进行了实证评估。我们的评估结果表明,领域专家选择的特征子集提高了分类器的灵敏度,而自动特征选择机制选择的特征子集提高了分类器对大多数类的预测能力(即本研究中的特异性)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信