Feature selection and classification in bioscience/medical datasets: study of parameters and multi-objective approach in Two-Phase EA/k-NN method

2010 UK Workshop on Computational Intelligence (UKCI) Pub Date : 2010-11-09 DOI:10.1109/UKCI.2010.5625581

M. Dissanayake, D. Corne

{"title":"Feature selection and classification in bioscience/medical datasets: study of parameters and multi-objective approach in Two-Phase EA/k-NN method","authors":"M. Dissanayake, D. Corne","doi":"10.1109/UKCI.2010.5625581","DOIUrl":null,"url":null,"abstract":"Feature selection continues to grow in importance in many areas of science and engineering, as large datasets become increasingly common. In particular, bioscience and medical datasets routinely contain several thousands of features. For effective data mining in such datasets, tools are required that can reliably distinguish the most relevant features. The latter is a useful goal in itself (e.g. such features may be putative drug targets), and also improves (perhaps drastically) both the speed of machine learning algorithms on the dataset, and the quality of predictive models. Among much research in feature selection methods, previous work has shown promise for an evolutionary algorithm/classifier combination (EA/k-NN), which, in successive phases of the same algorithm, serves first as the feature selection mechanism and second as the machine learning method yielding an accurate classifier. Here, we follow up that work by investigating the configuration and parametrisation of the two phases, including an investigation of multi-objective approaches for one or both phases. Following tests on three datasets, we find: further evidence that the two-phase approach is effective, with results on the most difficult dataset highly competitive with the literature; inconclusive results concerning the ideal way to configure the two phases; evidence in support of using a multi-objective approach in one or both phases.","PeriodicalId":403291,"journal":{"name":"2010 UK Workshop on Computational Intelligence (UKCI)","volume":"26 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 UK Workshop on Computational Intelligence (UKCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKCI.2010.5625581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Feature selection continues to grow in importance in many areas of science and engineering, as large datasets become increasingly common. In particular, bioscience and medical datasets routinely contain several thousands of features. For effective data mining in such datasets, tools are required that can reliably distinguish the most relevant features. The latter is a useful goal in itself (e.g. such features may be putative drug targets), and also improves (perhaps drastically) both the speed of machine learning algorithms on the dataset, and the quality of predictive models. Among much research in feature selection methods, previous work has shown promise for an evolutionary algorithm/classifier combination (EA/k-NN), which, in successive phases of the same algorithm, serves first as the feature selection mechanism and second as the machine learning method yielding an accurate classifier. Here, we follow up that work by investigating the configuration and parametrisation of the two phases, including an investigation of multi-objective approaches for one or both phases. Following tests on three datasets, we find: further evidence that the two-phase approach is effective, with results on the most difficult dataset highly competitive with the literature; inconclusive results concerning the ideal way to configure the two phases; evidence in support of using a multi-objective approach in one or both phases.

查看原文本刊更多论文

生物科学/医学数据集的特征选择与分类:两阶段EA/k-NN方法的参数和多目标方法研究

随着大型数据集变得越来越普遍，特征选择在科学和工程的许多领域中越来越重要。特别是，生物科学和医学数据集通常包含数千个特征。为了在这些数据集中进行有效的数据挖掘，需要能够可靠地区分最相关特征的工具。后者本身就是一个有用的目标(例如，这些特征可能是假定的药物靶点)，并且还可以(可能大大)提高数据集上机器学习算法的速度和预测模型的质量。在特征选择方法的许多研究中，先前的工作已经显示出进化算法/分类器组合(EA/k-NN)的前景，该组合在同一算法的连续阶段中，首先作为特征选择机制，其次作为产生准确分类器的机器学习方法。在这里，我们通过研究两个阶段的配置和参数化来跟进这项工作，包括对一个或两个阶段的多目标方法的研究。在对三个数据集进行测试后，我们发现:进一步证明两阶段方法是有效的，在最困难的数据集上的结果与文献高度竞争;两相的理想配置方式尚无定论;支持在一个或两个阶段使用多目标方法的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 UK Workshop on Computational Intelligence (UKCI)

自引率

0.00%

发文量