{"title":"高维数据集并行特征选择的自适应协同进化算法","authors":"Marjan Firouznia, G. Trunfio","doi":"10.1109/pdp55904.2022.00040","DOIUrl":null,"url":null,"abstract":"Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets\",\"authors\":\"Marjan Firouznia, G. Trunfio\",\"doi\":\"10.1109/pdp55904.2022.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.\",\"PeriodicalId\":210759,\"journal\":{\"name\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/pdp55904.2022.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets
Nowadays, it is common in many disciplines and application fields to collect large volumes of data characterized by a high number of features. Such datasets are at the basis of modern applications of supervised Machine Learning, where the goal is to create a classifier for newly presented data. However, it is well known that the presence of irrelevant features in the dataset can lead to a harder learning phase and, above all, can produce suboptimal classifiers. For this reason, the ability to select an appropriate subset of the available features is becoming increasingly important. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features since common optimization algorithms often suffer from poor scalability with respect to the size of the search space. In this paper, the problem of feature subset optimization is successfully addressed by a cooperative coevolutionary algorithm based on Differential Evolution. In the proposed algorithm, parallelized for multi-threaded execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adapting the population size during the optimization results in a significant performance. A numerical investigation on some high-dimensional datasets show that, in most cases, the proposed approach can achieve smaller feature subsets and higher classification performance than other state-of-the-art methods.