{"title":"Optimizing Feature Selection Parameters using Statistically Equivalent Signature (SES) Algorithm","authors":"U. Khaire, R. Dhanalakshmi","doi":"10.1109/ISCON47742.2019.9036211","DOIUrl":null,"url":null,"abstract":"Selection of important feature from the high dimensional dataset is a very important task. Irrelevant and insignificant features can hinder the important information of the dataset. For accurate classification of the dataset, selection of most predictive variable is a much-needed task. Feature selection is important for the diagnosis, prognosis and treatment of any disease in case of healthcare dataset. Traditional feature selection algorithm gives the output as a single feature subset of the predictive variable. But the selection of a single feature subset cannot give the perfect idea about the nature of the dataset. The study shows that variable which is not selected by some of the feature selection algorithms will also play a major role in defining the target value of the data sample. Statistically Equivalent Signature (SES) algorithm even consider the set of such important features by considering the Markov blanket of the target variable. The root of SES is in the causal theory and Bayesian network. SES select subset of important features based on the concept of conditional independence. Finally a subset of equally predictive features/variables selected based on the Markov Blanket of target variable T. This leads to improved accuracy when considered with other feature selection algorithms. This paper presents a detailed study for selecting multiple predictive feature subsets that are statistically equivalent and their comparison with the existing feature selection algorithms. It gives an overall idea to process high dimensional datasets especially microarray (gene expression) which play a vital role in the estimation of deadly diseases such as cancer.","PeriodicalId":124412,"journal":{"name":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCON47742.2019.9036211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Selection of important feature from the high dimensional dataset is a very important task. Irrelevant and insignificant features can hinder the important information of the dataset. For accurate classification of the dataset, selection of most predictive variable is a much-needed task. Feature selection is important for the diagnosis, prognosis and treatment of any disease in case of healthcare dataset. Traditional feature selection algorithm gives the output as a single feature subset of the predictive variable. But the selection of a single feature subset cannot give the perfect idea about the nature of the dataset. The study shows that variable which is not selected by some of the feature selection algorithms will also play a major role in defining the target value of the data sample. Statistically Equivalent Signature (SES) algorithm even consider the set of such important features by considering the Markov blanket of the target variable. The root of SES is in the causal theory and Bayesian network. SES select subset of important features based on the concept of conditional independence. Finally a subset of equally predictive features/variables selected based on the Markov Blanket of target variable T. This leads to improved accuracy when considered with other feature selection algorithms. This paper presents a detailed study for selecting multiple predictive feature subsets that are statistically equivalent and their comparison with the existing feature selection algorithms. It gives an overall idea to process high dimensional datasets especially microarray (gene expression) which play a vital role in the estimation of deadly diseases such as cancer.