{"title":"Ridge Regression based Missing Data Estimation with Dimensionality Reduction: Microarray Gene Expression Data","authors":"Ashfaq Ahmed K., Dr.Shaheda Akthar","doi":"10.14704/web/v19i1/web19271","DOIUrl":null,"url":null,"abstract":"Data is considered to be the important element in the field of Data Science and Machine Learning. Performance of Machine Learning and Data Mining algorithms greatly influenced by the characteristics of data and data with missing values. Performance of all these Machine Learning algorithms greatly improved and they can give accurate results when the data is in full without missing values. So before applying these algorithms; dataset and its missing values are completely filled. To impute these missing values in the dataset there are numerous methods were proposed. In this paper we used micro array gene expression dataset; by introducing various percentages of missing values a new methodology is proposed to impute these missing values in the data set. The nature of micro array gene expression dataset is huge in dimensionality, so at first, we used recursive feature elimination method to select the best features which contributes much for model was selected then we apply the Ridge Regression for imputation. Imputations with other methods are compared. We evaluate the performance of all models by using the metrics like MSE, MAE, R-square. To select the best model in the set of models we used Normalized Criteria Distance (NCD) to rank the models under proposed metrics. The model with least NCD rank selected as the best model among other models, in our paper proposed model has got the lowest value among other models and considered to be the best model among other models.","PeriodicalId":35441,"journal":{"name":"Webology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Webology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14704/web/v19i1/web19271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Data is considered to be the important element in the field of Data Science and Machine Learning. Performance of Machine Learning and Data Mining algorithms greatly influenced by the characteristics of data and data with missing values. Performance of all these Machine Learning algorithms greatly improved and they can give accurate results when the data is in full without missing values. So before applying these algorithms; dataset and its missing values are completely filled. To impute these missing values in the dataset there are numerous methods were proposed. In this paper we used micro array gene expression dataset; by introducing various percentages of missing values a new methodology is proposed to impute these missing values in the data set. The nature of micro array gene expression dataset is huge in dimensionality, so at first, we used recursive feature elimination method to select the best features which contributes much for model was selected then we apply the Ridge Regression for imputation. Imputations with other methods are compared. We evaluate the performance of all models by using the metrics like MSE, MAE, R-square. To select the best model in the set of models we used Normalized Criteria Distance (NCD) to rank the models under proposed metrics. The model with least NCD rank selected as the best model among other models, in our paper proposed model has got the lowest value among other models and considered to be the best model among other models.
WebologySocial Sciences-Library and Information Sciences
自引率
0.00%
发文量
374
审稿时长
10 weeks
期刊介绍:
Webology is an international peer-reviewed journal in English devoted to the field of the World Wide Web and serves as a forum for discussion and experimentation. It serves as a forum for new research in information dissemination and communication processes in general, and in the context of the World Wide Web in particular. Concerns include the production, gathering, recording, processing, storing, representing, sharing, transmitting, retrieving, distribution, and dissemination of information, as well as its social and cultural impacts. There is a strong emphasis on the Web and new information technologies. Special topic issues are also often seen.