比较离散结果模型中不同缺失数据替换方法的性能

Md Istiak Jahan, Tanmoy Bhowmik, Lauren Hoover, Naveen Eluru
{"title":"比较离散结果模型中不同缺失数据替换方法的性能","authors":"Md Istiak Jahan, Tanmoy Bhowmik, Lauren Hoover, Naveen Eluru","doi":"10.1177/03611981241264278","DOIUrl":null,"url":null,"abstract":"Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing the Performance of Different Missing Data Imputation Approaches in Discrete Outcome Modeling\",\"authors\":\"Md Istiak Jahan, Tanmoy Bhowmik, Lauren Hoover, Naveen Eluru\",\"doi\":\"10.1177/03611981241264278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.\",\"PeriodicalId\":517391,\"journal\":{\"name\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/03611981241264278\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241264278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

虽然有几种数据估算方法,但这些方法在交通领域的应用并不普遍。本文旨在帮助交通研究人员和从业人员使用缺失数据集开发模型。研究从数据模拟练习开始,评估了针对缺失数据实施的不同解决方案。我们在分析中考虑的维度包括:自变量的性质、不同类型的缺失变量、不同份额的缺失值、多种数据样本大小,以及对单一估算 (SI)、多重估算 (MI) 和完整案例数据 (CCD) 方法的评估。通过采用适当的推理过程,对多重实现的 MI 方法进行比较。通过模拟练习,我们发现 MI 方法的性能始终优于 SI 方法。根据我们的结果,在各种实现方式中,我们选择了五种实现方式的 MI 方法。在不同条件下,我们使用模型拟合度量和参数边际效应对五次变现的 MI 方法和 CCD 方法进行了比较。结果表明,在缺失数据比例较小的情况下,对于较大的数据集,通过放弃缺失值观测值来建立 CCD 模型可能比建立估算模型更有利。然而,当缺失数据的比例需要排除变量时,采用 MI 方法来建立模型是重要的,甚至是必要的。在本文的第二部分,根据我们的研究结果,我们对四个离散结果变量缺失值的真实经验数据集实施了 MI 方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing the Performance of Different Missing Data Imputation Approaches in Discrete Outcome Modeling
Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信