{"title":"Feature selection using differential evolution for microarray data classification","authors":"Sanjay Prajapati, Himansu Das, Mahendra Kumar Gourisaria","doi":"10.1007/s43926-023-00042-5","DOIUrl":null,"url":null,"abstract":"Abstract The dimensions of microarray datasets are very large, containing noise and redundancy. The problem with microarray datasets is the presence of more features compared to the number of samples, which adversely affects algorithm performance. In other words, the number of columns exceeds the number of rows. Therefore, to extract precise information from microarray datasets, a robust technique is required. Microarray datasets play a critical role in detecting various diseases, including cancer and tumors. This is where feature selection techniques come into play. In recent times, feature selection (FS) has gained significant importance as a data preparation method, particularly for high-dimensional data. It is preferable to address classification problems with fewer features while maintaining high accuracy, as not all features are necessary to achieve this goal. The primary objective of feature selection is to identify the optimal subset of features. In this context, we will employ the Differential Evolution (DE) algorithm. DE is a population-based stochastic search approach that has found widespread use in various scientific and technical domains to solve optimization problems in continuous spaces. In our approach, we will combine DE with three different classification algorithms: Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Our analysis will include a comparison of the accuracy achieved by each algorithmic model on each dataset, as well as the fitness error for each model. The results indicate that when feature selection was used the results were better compared to the results where the feature selection was not used.","PeriodicalId":34751,"journal":{"name":"Discover Internet of Things","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s43926-023-00042-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract The dimensions of microarray datasets are very large, containing noise and redundancy. The problem with microarray datasets is the presence of more features compared to the number of samples, which adversely affects algorithm performance. In other words, the number of columns exceeds the number of rows. Therefore, to extract precise information from microarray datasets, a robust technique is required. Microarray datasets play a critical role in detecting various diseases, including cancer and tumors. This is where feature selection techniques come into play. In recent times, feature selection (FS) has gained significant importance as a data preparation method, particularly for high-dimensional data. It is preferable to address classification problems with fewer features while maintaining high accuracy, as not all features are necessary to achieve this goal. The primary objective of feature selection is to identify the optimal subset of features. In this context, we will employ the Differential Evolution (DE) algorithm. DE is a population-based stochastic search approach that has found widespread use in various scientific and technical domains to solve optimization problems in continuous spaces. In our approach, we will combine DE with three different classification algorithms: Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Our analysis will include a comparison of the accuracy achieved by each algorithmic model on each dataset, as well as the fitness error for each model. The results indicate that when feature selection was used the results were better compared to the results where the feature selection was not used.
期刊介绍:
Discover Internet of Things is part of the Discover journal series committed to providing a streamlined submission process, rapid review and publication, and a high level of author service at every stage. It is an open access, community-focussed journal publishing research from across all fields relevant to the Internet of Things (IoT), providing cutting-edge and state-of-art research findings to researchers, academicians, students, and engineers.
Discover Internet of Things is a broad, open access journal publishing research from across all fields relevant to IoT. Discover Internet of Things covers concepts at the component, hardware, and system level as well as programming, operating systems, software, applications and other technology-oriented research topics. The journal is uniquely interdisciplinary because its scope spans several research communities, ranging from computer systems to communication, optimisation, big data analytics, and application. It is also intended that articles published in Discover Internet of Things may help to support and accelerate Sustainable Development Goal 9: ‘Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation’.
Discover Internet of Things welcomes all observational, experimental, theoretical, analytical, mathematical modelling, data-driven, and applied approaches that advance the study of all aspects of IoT research.