{"title":"基于多相关和进化多任务的并行混合特征选择方法","authors":"Mohamed Amine Azaiz, Djamel Amar Bensaber","doi":"10.4018/ijghpc.320475","DOIUrl":null,"url":null,"abstract":"Particle swarm optimization (PSO) has been successfully applied to feature selection (FS) due to its efficiency and ease of implementation. Like most evolutionary algorithms, they still suffer from a high computational burden and poor generalization ability. Multifactorial optimization (MFO), as an effective evolutionary multitasking paradigm, has been widely used for solving complex problems through implicit knowledge transfer between related tasks. Based on MFO, this study proposes a PSO-based FS method to solve high-dimensional classification via information sharing between two related tasks generated from a dataset using two different measures of correlation. To be specific, two subsets of relevant features are generated using symmetric uncertainty measure and Pearson correlation coefficient, then each subset is assigned to one task. To improve runtime, the authors proposed a parallel fitness evaluation of particles under Apache Spark. The results show that the proposed FS method can achieve higher classification accuracy with a smaller feature subset in a reasonable time.","PeriodicalId":43565,"journal":{"name":"International Journal of Grid and High Performance Computing","volume":"8 1","pages":"1-23"},"PeriodicalIF":0.6000,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Parallel Hybrid Feature Selection Approach Based on Multi-Correlation and Evolutionary Multitasking\",\"authors\":\"Mohamed Amine Azaiz, Djamel Amar Bensaber\",\"doi\":\"10.4018/ijghpc.320475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Particle swarm optimization (PSO) has been successfully applied to feature selection (FS) due to its efficiency and ease of implementation. Like most evolutionary algorithms, they still suffer from a high computational burden and poor generalization ability. Multifactorial optimization (MFO), as an effective evolutionary multitasking paradigm, has been widely used for solving complex problems through implicit knowledge transfer between related tasks. Based on MFO, this study proposes a PSO-based FS method to solve high-dimensional classification via information sharing between two related tasks generated from a dataset using two different measures of correlation. To be specific, two subsets of relevant features are generated using symmetric uncertainty measure and Pearson correlation coefficient, then each subset is assigned to one task. To improve runtime, the authors proposed a parallel fitness evaluation of particles under Apache Spark. The results show that the proposed FS method can achieve higher classification accuracy with a smaller feature subset in a reasonable time.\",\"PeriodicalId\":43565,\"journal\":{\"name\":\"International Journal of Grid and High Performance Computing\",\"volume\":\"8 1\",\"pages\":\"1-23\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijghpc.320475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijghpc.320475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A Parallel Hybrid Feature Selection Approach Based on Multi-Correlation and Evolutionary Multitasking
Particle swarm optimization (PSO) has been successfully applied to feature selection (FS) due to its efficiency and ease of implementation. Like most evolutionary algorithms, they still suffer from a high computational burden and poor generalization ability. Multifactorial optimization (MFO), as an effective evolutionary multitasking paradigm, has been widely used for solving complex problems through implicit knowledge transfer between related tasks. Based on MFO, this study proposes a PSO-based FS method to solve high-dimensional classification via information sharing between two related tasks generated from a dataset using two different measures of correlation. To be specific, two subsets of relevant features are generated using symmetric uncertainty measure and Pearson correlation coefficient, then each subset is assigned to one task. To improve runtime, the authors proposed a parallel fitness evaluation of particles under Apache Spark. The results show that the proposed FS method can achieve higher classification accuracy with a smaller feature subset in a reasonable time.