A Genetic Asexual Reproduction Optimization Algorithm for Imputing Missing Values

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) Pub Date : 2019-10-01 DOI:10.1109/ICCKE48569.2019.8964808

M. Noei, M. S. Abadeh

{"title":"A Genetic Asexual Reproduction Optimization Algorithm for Imputing Missing Values","authors":"M. Noei, M. S. Abadeh","doi":"10.1109/ICCKE48569.2019.8964808","DOIUrl":null,"url":null,"abstract":"In this paper, we suggest a new technique that significantly improve the computational time of the genetic algorithm for imputing missing values. Data contain noise and missing values, which made them unreliable for scientific purposes. Due to this, we are required to preprocess these data before using them. Researchers either avoid or impute missing data. It is necessary to choose an appropriate imputation method, and it is based on several factors such as datatypes and numbers of missing data. For a higher missing value rate, missing value imputation (MVI) can be suitable way for imputing missing data in incomplete dataset. One of the MVI methods is the genetic algorithm; although genetic algorithm may produce good results, the computational time is very high. The proposed algorithm is a combination of the genetic and Asexual Reproduction Optimization (ARO) algorithm. We present an experimental evaluation of Pima and mammographic mass dataset that collected from UCI repository. In the small percentage of missing values, those instances can be imputed by the ARO algorithm, but in the case of large amounts, our approach illustrates much better results. This proposed technique works even better when the rate of missing values is higher. The accuracy and computational time of our proposed algorithm are compared with another techniques like Mean, K-Nearest Neighbor, and SVM. On average our approach 8% improved the accuracy and 4% improved the ROC, and it requires less computational time than a basic genetic algorithm.","PeriodicalId":6685,"journal":{"name":"2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"9 1","pages":"214-218"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE48569.2019.8964808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

In this paper, we suggest a new technique that significantly improve the computational time of the genetic algorithm for imputing missing values. Data contain noise and missing values, which made them unreliable for scientific purposes. Due to this, we are required to preprocess these data before using them. Researchers either avoid or impute missing data. It is necessary to choose an appropriate imputation method, and it is based on several factors such as datatypes and numbers of missing data. For a higher missing value rate, missing value imputation (MVI) can be suitable way for imputing missing data in incomplete dataset. One of the MVI methods is the genetic algorithm; although genetic algorithm may produce good results, the computational time is very high. The proposed algorithm is a combination of the genetic and Asexual Reproduction Optimization (ARO) algorithm. We present an experimental evaluation of Pima and mammographic mass dataset that collected from UCI repository. In the small percentage of missing values, those instances can be imputed by the ARO algorithm, but in the case of large amounts, our approach illustrates much better results. This proposed technique works even better when the rate of missing values is higher. The accuracy and computational time of our proposed algorithm are compared with another techniques like Mean, K-Nearest Neighbor, and SVM. On average our approach 8% improved the accuracy and 4% improved the ROC, and it requires less computational time than a basic genetic algorithm.

查看原文本刊更多论文

缺失值输入的遗传无性繁殖优化算法

在本文中，我们提出了一种新的技术，可以显著提高缺失值的遗传算法的计算时间。数据包含噪声和缺失值，这使得它们对科学目的不可靠。因此，我们需要在使用这些数据之前对其进行预处理。研究人员要么回避，要么归咎于缺失的数据。选择合适的归算方法是必要的，这是基于数据类型和缺失数据数量等几个因素。对于缺失值率较高的不完整数据集，缺失值插值(MVI)是一种适合的缺失数据的插值方法。其中一种MVI方法是遗传算法;虽然遗传算法可以产生很好的结果，但计算时间非常高。该算法是遗传算法和无性生殖优化算法的结合。我们提出了从UCI存储库收集的皮马和乳房x线摄影质量数据集的实验评估。在缺失值的一小部分情况下，这些实例可以通过ARO算法进行估算，但是在缺失值很大的情况下，我们的方法显示了更好的结果。当缺失值的比率较高时，这种建议的技术效果更好。我们提出的算法的精度和计算时间与其他技术如均值，k近邻和支持向量机进行了比较。平均而言，我们的方法提高了8%的准确率，提高了4%的ROC，并且它比基本的遗传算法需要更少的计算时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)

自引率

0.00%

发文量