高维数据集并行特征选择的自适应协同进化算法

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00040

Marjan Firouznia, G. Trunfio

{"title":"高维数据集并行特征选择的自适应协同进化算法","authors":"Marjan Firouznia, G. Trunfio","doi":"10.1109/pdp55904.2022.00040","DOIUrl":null,"url":null,"abstract":"Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets\",\"authors\":\"Marjan Firouznia, G. Trunfio\",\"doi\":\"10.1109/pdp55904.2022.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.\",\"PeriodicalId\":210759,\"journal\":{\"name\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/pdp55904.2022.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

如今，在许多学科和应用领域中，收集大量具有大量特征的数据是很常见的。这些数据集是监督式机器学习现代应用的基础，其目标是为新呈现的数据创建分类器。然而，众所周知，数据集中不相关特征的存在会导致更困难的学习阶段，最重要的是，会产生次优分类器。由于这个原因，选择可用特性的适当子集的能力变得越来越重要。传统上，优化元启发式在特征选择任务中得到了成功的应用。然而，文献中提出的许多方法并不适用于具有数千个特征的数据集，因为常见的优化算法在搜索空间的大小方面往往存在较差的可扩展性。本文采用一种基于差分进化的协同进化算法，成功地解决了特征子集优化问题。该算法对共享内存架构下的多线程执行进行并行化处理，在优化过程中采用适当的策略降低搜索空间的维数并调整种群大小，从而获得显著的性能。对一些高维数据集的数值研究表明，在大多数情况下，该方法可以实现更小的特征子集和更高的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets

Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量