Optimizing Feature Selection Parameters using Statistically Equivalent Signature (SES) Algorithm

2019 4th International Conference on Information Systems and Computer Networks (ISCON) Pub Date : 2019-11-01 DOI:10.1109/ISCON47742.2019.9036211

U. Khaire, R. Dhanalakshmi

{"title":"Optimizing Feature Selection Parameters using Statistically Equivalent Signature (SES) Algorithm","authors":"U. Khaire, R. Dhanalakshmi","doi":"10.1109/ISCON47742.2019.9036211","DOIUrl":null,"url":null,"abstract":"Selection of important feature from the high dimensional dataset is a very important task. Irrelevant and insignificant features can hinder the important information of the dataset. For accurate classification of the dataset, selection of most predictive variable is a much-needed task. Feature selection is important for the diagnosis, prognosis and treatment of any disease in case of healthcare dataset. Traditional feature selection algorithm gives the output as a single feature subset of the predictive variable. But the selection of a single feature subset cannot give the perfect idea about the nature of the dataset. The study shows that variable which is not selected by some of the feature selection algorithms will also play a major role in defining the target value of the data sample. Statistically Equivalent Signature (SES) algorithm even consider the set of such important features by considering the Markov blanket of the target variable. The root of SES is in the causal theory and Bayesian network. SES select subset of important features based on the concept of conditional independence. Finally a subset of equally predictive features/variables selected based on the Markov Blanket of target variable T. This leads to improved accuracy when considered with other feature selection algorithms. This paper presents a detailed study for selecting multiple predictive feature subsets that are statistically equivalent and their comparison with the existing feature selection algorithms. It gives an overall idea to process high dimensional datasets especially microarray (gene expression) which play a vital role in the estimation of deadly diseases such as cancer.","PeriodicalId":124412,"journal":{"name":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCON47742.2019.9036211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Selection of important feature from the high dimensional dataset is a very important task. Irrelevant and insignificant features can hinder the important information of the dataset. For accurate classification of the dataset, selection of most predictive variable is a much-needed task. Feature selection is important for the diagnosis, prognosis and treatment of any disease in case of healthcare dataset. Traditional feature selection algorithm gives the output as a single feature subset of the predictive variable. But the selection of a single feature subset cannot give the perfect idea about the nature of the dataset. The study shows that variable which is not selected by some of the feature selection algorithms will also play a major role in defining the target value of the data sample. Statistically Equivalent Signature (SES) algorithm even consider the set of such important features by considering the Markov blanket of the target variable. The root of SES is in the causal theory and Bayesian network. SES select subset of important features based on the concept of conditional independence. Finally a subset of equally predictive features/variables selected based on the Markov Blanket of target variable T. This leads to improved accuracy when considered with other feature selection algorithms. This paper presents a detailed study for selecting multiple predictive feature subsets that are statistically equivalent and their comparison with the existing feature selection algorithms. It gives an overall idea to process high dimensional datasets especially microarray (gene expression) which play a vital role in the estimation of deadly diseases such as cancer.

查看原文本刊更多论文

基于统计等效签名(SES)算法的特征选择参数优化

从高维数据集中选择重要特征是一项非常重要的任务。不相关和不重要的特征会阻碍数据集的重要信息。为了对数据集进行准确分类，选择最具预测性的变量是一项急需完成的任务。在医疗数据集中，特征选择对于任何疾病的诊断、预后和治疗都很重要。传统的特征选择算法将输出作为预测变量的单个特征子集。但是，单个特征子集的选择并不能给出关于数据集性质的完美想法。研究表明，未被某些特征选择算法选择的变量也将在定义数据样本的目标值方面发挥重要作用。统计等效签名(SES)算法甚至通过考虑目标变量的马尔可夫包层来考虑这些重要特征的集合。社会经济学的根源在于因果理论和贝叶斯网络。SES根据条件独立的概念选择重要特征的子集。最后，基于目标变量t的马尔可夫毯选择具有相同预测性的特征/变量的子集。当与其他特征选择算法考虑时，这会提高准确性。本文详细研究了统计等效的多个预测特征子集的选取方法，并与已有的特征选取算法进行了比较。它给出了处理高维数据集的总体思路，特别是微阵列(基因表达)，它在致命疾病如癌症的估计中起着至关重要的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 4th International Conference on Information Systems and Computer Networks (ISCON)

自引率

0.00%

发文量