W. Jaisingh, Subash Chandra Bose Jaganathan, Akanksha Verma
{"title":"Gene Selection by Hybrid Feature Selection Approaches and Classification Techniques in Microarray Dataset for Cancer Prediction","authors":"W. Jaisingh, Subash Chandra Bose Jaganathan, Akanksha Verma","doi":"10.1109/iSSSC56467.2022.10051247","DOIUrl":null,"url":null,"abstract":"The two most intriguing machine learning issues are feature (gene) selection and the categorization of microarray data. In this paper, three current feature selection and extraction methods, namely Information Gain (IG), Gain Ratio (GR), Relief, Information Gain with Relief, and Gain Ratio with Relief, are combined to develop a new selection and extraction method. The primary objective of this study is to use hybrid feature extraction to select the independent components of DNA microarray data. This is done with the intention of enhancing the performance of support vector machine (SVM) and neural network (NN) classifiers while simultaneously reducing the amount of computational resources required to complete the analysis In order to provide evidence that the methodology is reliable, it is applied to reduce the total number of genes present in four different DNA microarray datasets: Breast GSE22820, Breast GSE38959, Breast GSE42568, and Breast. SVM and NN classifiers are being used to classify these datasets. The results of the experiments performed on these four microarray gene expression data prove that the genes found by using the proposed methodology efficiently make improvements to the classification accuracy of SVM and NN classifiers. We compare our proposed method to standard current extraction algorithms and find that by employing SVM and NN classifiers with a reduced number of identified genes, the new method achieves superior classification accuracy. The receiver operating characteristic (ROC) curve shows the best subset of genes for the classifier and the suggested method for each unique dataset. Keywords— Information Gain, Gain Ratio, Relief, support vector machine, Neural Networks, Feature selection, Microarray data.","PeriodicalId":334645,"journal":{"name":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSSSC56467.2022.10051247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The two most intriguing machine learning issues are feature (gene) selection and the categorization of microarray data. In this paper, three current feature selection and extraction methods, namely Information Gain (IG), Gain Ratio (GR), Relief, Information Gain with Relief, and Gain Ratio with Relief, are combined to develop a new selection and extraction method. The primary objective of this study is to use hybrid feature extraction to select the independent components of DNA microarray data. This is done with the intention of enhancing the performance of support vector machine (SVM) and neural network (NN) classifiers while simultaneously reducing the amount of computational resources required to complete the analysis In order to provide evidence that the methodology is reliable, it is applied to reduce the total number of genes present in four different DNA microarray datasets: Breast GSE22820, Breast GSE38959, Breast GSE42568, and Breast. SVM and NN classifiers are being used to classify these datasets. The results of the experiments performed on these four microarray gene expression data prove that the genes found by using the proposed methodology efficiently make improvements to the classification accuracy of SVM and NN classifiers. We compare our proposed method to standard current extraction algorithms and find that by employing SVM and NN classifiers with a reduced number of identified genes, the new method achieves superior classification accuracy. The receiver operating characteristic (ROC) curve shows the best subset of genes for the classifier and the suggested method for each unique dataset. Keywords— Information Gain, Gain Ratio, Relief, support vector machine, Neural Networks, Feature selection, Microarray data.