Comparing Several Missing Data Estimation Methods in Linear Regression;Real Data Example and A Simulation Study

CAUCHY: Jurnal Matematika Murni dan Aplikasi Pub Date : 2023-05-24 DOI:10.18860/ca.v7i4.20548

A. Fitrianto, Jap Ee Jia, B. Susetyo, L. Rahman

{"title":"Comparing Several Missing Data Estimation Methods in Linear Regression;Real Data Example and A Simulation Study","authors":"A. Fitrianto, Jap Ee Jia, B. Susetyo, L. Rahman","doi":"10.18860/ca.v7i4.20548","DOIUrl":null,"url":null,"abstract":"Analysis on incomplete could lead to biased estimation when using standard statistical procedure since it ignores the missing observations. The disadvantage of ignoring missing data is that the researcher might not have enough data to conduct an analysis. The main objective of the study is to compare the performance between listwise deletion (LD), mean substitution (MS) and multiple imputation (MI) method in estimating parameters. The performance will be measured through bias, standard error and 95% confidence interval of interested estimates for handling missing data with 10% missing observations. A complete empirical data set was used and assumed as population data. Ten percent of total observations in the population ere set as missing arbitrarily by generating random numbers from a uniform distribution, . Then, bias of parameter estimates and confidence interval of parameter estimates are calculated to compare the three methods. A Monte Carlo simulation was carried out to know the properties of missing data and investigated using simulated random numbers. Simulation of 1000 sampled data with 20, 50, and 100 observations and each sample is set to have 10% missing observations. Standard statistical analyses are run for each missing data and get the average of parameter estimates to calculate the bias and standard error of parameter estimates for every missing data method. The analysis was conducted by using SAS version 9.2. It was found that the MI method provided the smallest bias and standard error of parameter estimates and a narrower confidence interval compared to the LD and MS methods Meanwhile, the LD method gives a smaller bias of parameter estimates and standard error for small sample size of missing data. And, MS method is strongly recommended not to use for handling missing data because it will result in large bias and standard error of parameter estimates.","PeriodicalId":388519,"journal":{"name":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18860/ca.v7i4.20548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Analysis on incomplete could lead to biased estimation when using standard statistical procedure since it ignores the missing observations. The disadvantage of ignoring missing data is that the researcher might not have enough data to conduct an analysis. The main objective of the study is to compare the performance between listwise deletion (LD), mean substitution (MS) and multiple imputation (MI) method in estimating parameters. The performance will be measured through bias, standard error and 95% confidence interval of interested estimates for handling missing data with 10% missing observations. A complete empirical data set was used and assumed as population data. Ten percent of total observations in the population ere set as missing arbitrarily by generating random numbers from a uniform distribution, . Then, bias of parameter estimates and confidence interval of parameter estimates are calculated to compare the three methods. A Monte Carlo simulation was carried out to know the properties of missing data and investigated using simulated random numbers. Simulation of 1000 sampled data with 20, 50, and 100 observations and each sample is set to have 10% missing observations. Standard statistical analyses are run for each missing data and get the average of parameter estimates to calculate the bias and standard error of parameter estimates for every missing data method. The analysis was conducted by using SAS version 9.2. It was found that the MI method provided the smallest bias and standard error of parameter estimates and a narrower confidence interval compared to the LD and MS methods Meanwhile, the LD method gives a smaller bias of parameter estimates and standard error for small sample size of missing data. And, MS method is strongly recommended not to use for handling missing data because it will result in large bias and standard error of parameter estimates.

查看原文本刊更多论文

线性回归中几种缺失数据估计方法的比较;真实数据实例与仿真研究

在使用标准统计程序时，不完整的分析忽略了缺失的观测值，可能导致估计偏倚。忽略缺失数据的缺点是研究人员可能没有足够的数据来进行分析。本研究的主要目的是比较列表删除法(LD)、平均替代法(MS)和多重插值法(MI)在参数估计方面的性能。性能将通过偏差、标准误差和95%置信区间来衡量，以处理10%缺失观测值的缺失数据。采用完整的经验数据集作为人口数据。通过从均匀分布中生成随机数，将总体中总观测值的10%设置为任意缺失。然后，计算参数估计偏差和参数估计置信区间，对三种方法进行比较。通过蒙特卡罗模拟了解了丢失数据的性质，并利用模拟随机数进行了研究。模拟1000个采样数据，分别有20、50和100个观测值，每个样本设置为有10%的缺失观测值。对每一个缺失数据进行标准统计分析，得到参数估计的平均值，计算每一个缺失数据方法的参数估计偏差和标准误差。采用SAS 9.2版本进行分析。研究发现，与LD和MS方法相比，MI方法提供的参数估计偏差和标准误差最小，置信区间更窄，LD方法在缺失数据的小样本量下提供的参数估计偏差和标准误差较小。并且，强烈建议不要使用MS方法处理缺失数据，因为它会导致参数估计的较大偏差和标准误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CAUCHY: Jurnal Matematika Murni dan Aplikasi

自引率

0.00%

发文量