W. Jaisingh, Subash Chandra Bose Jaganathan, Akanksha Verma
{"title":"基于混合特征选择方法和微阵列数据集分类技术的基因选择用于癌症预测","authors":"W. Jaisingh, Subash Chandra Bose Jaganathan, Akanksha Verma","doi":"10.1109/iSSSC56467.2022.10051247","DOIUrl":null,"url":null,"abstract":"The two most intriguing machine learning issues are feature (gene) selection and the categorization of microarray data. In this paper, three current feature selection and extraction methods, namely Information Gain (IG), Gain Ratio (GR), Relief, Information Gain with Relief, and Gain Ratio with Relief, are combined to develop a new selection and extraction method. The primary objective of this study is to use hybrid feature extraction to select the independent components of DNA microarray data. This is done with the intention of enhancing the performance of support vector machine (SVM) and neural network (NN) classifiers while simultaneously reducing the amount of computational resources required to complete the analysis In order to provide evidence that the methodology is reliable, it is applied to reduce the total number of genes present in four different DNA microarray datasets: Breast GSE22820, Breast GSE38959, Breast GSE42568, and Breast. SVM and NN classifiers are being used to classify these datasets. The results of the experiments performed on these four microarray gene expression data prove that the genes found by using the proposed methodology efficiently make improvements to the classification accuracy of SVM and NN classifiers. We compare our proposed method to standard current extraction algorithms and find that by employing SVM and NN classifiers with a reduced number of identified genes, the new method achieves superior classification accuracy. The receiver operating characteristic (ROC) curve shows the best subset of genes for the classifier and the suggested method for each unique dataset. Keywords— Information Gain, Gain Ratio, Relief, support vector machine, Neural Networks, Feature selection, Microarray data.","PeriodicalId":334645,"journal":{"name":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gene Selection by Hybrid Feature Selection Approaches and Classification Techniques in Microarray Dataset for Cancer Prediction\",\"authors\":\"W. Jaisingh, Subash Chandra Bose Jaganathan, Akanksha Verma\",\"doi\":\"10.1109/iSSSC56467.2022.10051247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The two most intriguing machine learning issues are feature (gene) selection and the categorization of microarray data. In this paper, three current feature selection and extraction methods, namely Information Gain (IG), Gain Ratio (GR), Relief, Information Gain with Relief, and Gain Ratio with Relief, are combined to develop a new selection and extraction method. The primary objective of this study is to use hybrid feature extraction to select the independent components of DNA microarray data. This is done with the intention of enhancing the performance of support vector machine (SVM) and neural network (NN) classifiers while simultaneously reducing the amount of computational resources required to complete the analysis In order to provide evidence that the methodology is reliable, it is applied to reduce the total number of genes present in four different DNA microarray datasets: Breast GSE22820, Breast GSE38959, Breast GSE42568, and Breast. SVM and NN classifiers are being used to classify these datasets. The results of the experiments performed on these four microarray gene expression data prove that the genes found by using the proposed methodology efficiently make improvements to the classification accuracy of SVM and NN classifiers. We compare our proposed method to standard current extraction algorithms and find that by employing SVM and NN classifiers with a reduced number of identified genes, the new method achieves superior classification accuracy. The receiver operating characteristic (ROC) curve shows the best subset of genes for the classifier and the suggested method for each unique dataset. Keywords— Information Gain, Gain Ratio, Relief, support vector machine, Neural Networks, Feature selection, Microarray data.\",\"PeriodicalId\":334645,\"journal\":{\"name\":\"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSSSC56467.2022.10051247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSSSC56467.2022.10051247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
两个最有趣的机器学习问题是特征(基因)选择和微阵列数据的分类。本文将现有的三种特征选择与提取方法,即信息增益(Information Gain, IG)、增益比(Gain Ratio, GR)、起伏、信息增益与起伏、增益比与起伏相结合,提出了一种新的特征选择与提取方法。本研究的主要目的是使用混合特征提取来选择DNA微阵列数据的独立成分。这样做的目的是提高支持向量机(SVM)和神经网络(NN)分类器的性能,同时减少完成分析所需的计算资源。为了提供证据证明该方法是可靠的,它被应用于减少四个不同DNA微阵列数据集中存在的基因总数:Breast GSE22820、Breast GSE38959、Breast GSE42568和Breast。支持向量机和神经网络分类器被用来对这些数据集进行分类。对这4个基因表达数据进行的实验结果表明,使用该方法发现的基因有效地提高了SVM和NN分类器的分类精度。我们将我们提出的方法与当前标准的提取算法进行了比较,发现通过使用SVM和NN分类器,减少了识别基因的数量,新方法获得了更高的分类精度。接收者工作特征(ROC)曲线显示了分类器的最佳基因子集和每个独特数据集的建议方法。关键词:信息增益,增益比,浮雕,支持向量机,神经网络,特征选择,微阵列数据。
Gene Selection by Hybrid Feature Selection Approaches and Classification Techniques in Microarray Dataset for Cancer Prediction
The two most intriguing machine learning issues are feature (gene) selection and the categorization of microarray data. In this paper, three current feature selection and extraction methods, namely Information Gain (IG), Gain Ratio (GR), Relief, Information Gain with Relief, and Gain Ratio with Relief, are combined to develop a new selection and extraction method. The primary objective of this study is to use hybrid feature extraction to select the independent components of DNA microarray data. This is done with the intention of enhancing the performance of support vector machine (SVM) and neural network (NN) classifiers while simultaneously reducing the amount of computational resources required to complete the analysis In order to provide evidence that the methodology is reliable, it is applied to reduce the total number of genes present in four different DNA microarray datasets: Breast GSE22820, Breast GSE38959, Breast GSE42568, and Breast. SVM and NN classifiers are being used to classify these datasets. The results of the experiments performed on these four microarray gene expression data prove that the genes found by using the proposed methodology efficiently make improvements to the classification accuracy of SVM and NN classifiers. We compare our proposed method to standard current extraction algorithms and find that by employing SVM and NN classifiers with a reduced number of identified genes, the new method achieves superior classification accuracy. The receiver operating characteristic (ROC) curve shows the best subset of genes for the classifier and the suggested method for each unique dataset. Keywords— Information Gain, Gain Ratio, Relief, support vector machine, Neural Networks, Feature selection, Microarray data.