Ridge Regression based Missing Data Estimation with Dimensionality Reduction: Microarray Gene Expression Data

Q2 Social Sciences
Ashfaq Ahmed K., Dr.Shaheda Akthar
{"title":"Ridge Regression based Missing Data Estimation with Dimensionality Reduction: Microarray Gene Expression Data","authors":"Ashfaq Ahmed K., Dr.Shaheda Akthar","doi":"10.14704/web/v19i1/web19271","DOIUrl":null,"url":null,"abstract":"Data is considered to be the important element in the field of Data Science and Machine Learning. Performance of Machine Learning and Data Mining algorithms greatly influenced by the characteristics of data and data with missing values. Performance of all these Machine Learning algorithms greatly improved and they can give accurate results when the data is in full without missing values. So before applying these algorithms; dataset and its missing values are completely filled. To impute these missing values in the dataset there are numerous methods were proposed. In this paper we used micro array gene expression dataset; by introducing various percentages of missing values a new methodology is proposed to impute these missing values in the data set. The nature of micro array gene expression dataset is huge in dimensionality, so at first, we used recursive feature elimination method to select the best features which contributes much for model was selected then we apply the Ridge Regression for imputation. Imputations with other methods are compared. We evaluate the performance of all models by using the metrics like MSE, MAE, R-square. To select the best model in the set of models we used Normalized Criteria Distance (NCD) to rank the models under proposed metrics. The model with least NCD rank selected as the best model among other models, in our paper proposed model has got the lowest value among other models and considered to be the best model among other models.","PeriodicalId":35441,"journal":{"name":"Webology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Webology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14704/web/v19i1/web19271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Data is considered to be the important element in the field of Data Science and Machine Learning. Performance of Machine Learning and Data Mining algorithms greatly influenced by the characteristics of data and data with missing values. Performance of all these Machine Learning algorithms greatly improved and they can give accurate results when the data is in full without missing values. So before applying these algorithms; dataset and its missing values are completely filled. To impute these missing values in the dataset there are numerous methods were proposed. In this paper we used micro array gene expression dataset; by introducing various percentages of missing values a new methodology is proposed to impute these missing values in the data set. The nature of micro array gene expression dataset is huge in dimensionality, so at first, we used recursive feature elimination method to select the best features which contributes much for model was selected then we apply the Ridge Regression for imputation. Imputations with other methods are compared. We evaluate the performance of all models by using the metrics like MSE, MAE, R-square. To select the best model in the set of models we used Normalized Criteria Distance (NCD) to rank the models under proposed metrics. The model with least NCD rank selected as the best model among other models, in our paper proposed model has got the lowest value among other models and considered to be the best model among other models.
基于脊回归的缺失数据降维估计:微阵列基因表达数据
数据被认为是数据科学和机器学习领域的重要元素。机器学习和数据挖掘算法的性能在很大程度上受到数据和具有缺失值的数据的特性的影响。所有这些机器学习算法的性能都得到了极大的提高,当数据完整时,它们可以给出准确的结果,而不会丢失值。因此,在应用这些算法之前;数据集及其缺失的值被完全填充。为了估算数据集中的这些缺失值,提出了许多方法。在本文中,我们使用了微阵列基因表达数据集;通过引入各种缺失值的百分比,提出了一种新的方法来估算数据集中的这些缺失值。微数组基因表达数据集的维数很大,因此,我们首先使用递归特征消除方法来选择对模型有很大贡献的最佳特征,然后应用岭回归进行插补。并与其他方法进行了比较。我们使用MSE、MAE、R-square等指标来评估所有模型的性能。为了在模型集中选择最佳模型,我们使用归一化标准距离(NCD)根据所提出的度量对模型进行排序。NCD秩最小的模型被选为其他模型中的最佳模型,在本文中提出的模型在其他模型中得到的值最低,被认为是其他模型中最好的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Webology
Webology Social Sciences-Library and Information Sciences
自引率
0.00%
发文量
374
审稿时长
10 weeks
期刊介绍: Webology is an international peer-reviewed journal in English devoted to the field of the World Wide Web and serves as a forum for discussion and experimentation. It serves as a forum for new research in information dissemination and communication processes in general, and in the context of the World Wide Web in particular. Concerns include the production, gathering, recording, processing, storing, representing, sharing, transmitting, retrieving, distribution, and dissemination of information, as well as its social and cultural impacts. There is a strong emphasis on the Web and new information technologies. Special topic issues are also often seen.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信