Simultaneous Feature Selection and Tuple Selection for Efficient Classification

M. Dash, V. Gopalkrishnan
{"title":"Simultaneous Feature Selection and Tuple Selection for Efficient Classification","authors":"M. Dash, V. Gopalkrishnan","doi":"10.4018/978-1-60566-748-5.CH012","DOIUrl":null,"url":null,"abstract":"It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/978-1-60566-748-5.CH012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct
同时特征选择和元组选择的高效分类
数据每天都在快速增长,这已经不是新闻了。特别是随着互联网变得无处不在,数据的来源变得越来越多。数据以两种方式增加:维度或特征,实例或示例或元组,但并非所有数据都是相关的。在收集任何特定方面的数据时,通常倾向于收集各种任务所需的尽可能多的信息。人们可能没有明确地想到任何特定的任务,例如分类。因此,在进行分类之前,数据挖掘专家应该先去除有噪声的、不相关的和冗余的数据,因为许多传统算法在存在这些有噪声的和不相关的数据时失败(Blum和Langley 1997)。例如,考虑微阵列基因表达数据,其中有数千个特征(或基因),而只有10个元组(或样本测试)。例如,白血病数据(Alon, Barkai et al. 1999)有7129个基因和72个样本测试。研究表明,即使只有很少的基因,人们也能达到相同甚至更好的预测
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信