{"title":"比较离散结果模型中不同缺失数据替换方法的性能","authors":"Md Istiak Jahan, Tanmoy Bhowmik, Lauren Hoover, Naveen Eluru","doi":"10.1177/03611981241264278","DOIUrl":null,"url":null,"abstract":"Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing the Performance of Different Missing Data Imputation Approaches in Discrete Outcome Modeling\",\"authors\":\"Md Istiak Jahan, Tanmoy Bhowmik, Lauren Hoover, Naveen Eluru\",\"doi\":\"10.1177/03611981241264278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.\",\"PeriodicalId\":517391,\"journal\":{\"name\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/03611981241264278\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241264278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
虽然有几种数据估算方法,但这些方法在交通领域的应用并不普遍。本文旨在帮助交通研究人员和从业人员使用缺失数据集开发模型。研究从数据模拟练习开始,评估了针对缺失数据实施的不同解决方案。我们在分析中考虑的维度包括:自变量的性质、不同类型的缺失变量、不同份额的缺失值、多种数据样本大小,以及对单一估算 (SI)、多重估算 (MI) 和完整案例数据 (CCD) 方法的评估。通过采用适当的推理过程,对多重实现的 MI 方法进行比较。通过模拟练习,我们发现 MI 方法的性能始终优于 SI 方法。根据我们的结果,在各种实现方式中,我们选择了五种实现方式的 MI 方法。在不同条件下,我们使用模型拟合度量和参数边际效应对五次变现的 MI 方法和 CCD 方法进行了比较。结果表明,在缺失数据比例较小的情况下,对于较大的数据集,通过放弃缺失值观测值来建立 CCD 模型可能比建立估算模型更有利。然而,当缺失数据的比例需要排除变量时,采用 MI 方法来建立模型是重要的,甚至是必要的。在本文的第二部分,根据我们的研究结果,我们对四个离散结果变量缺失值的真实经验数据集实施了 MI 方法。
Comparing the Performance of Different Missing Data Imputation Approaches in Discrete Outcome Modeling
Although several approaches exist for data imputation, these approaches are not commonly applied in transportation. The current paper is geared toward assisting transportation researchers and practitioners in developing models using datasets with missing data. The study begins with a data simulation exercise evaluating different solutions implemented for missing data. The dimensions considered in our analysis include: the nature of independent variables, different types of missing variables, different shares of missing values, multiple data sample sizes, and evaluation of single imputation (SI), multiple imputation (MI) and complete case data (CCD) approach. The comparison is conducted by adopting the appropriate inference process for the MI approach with multiple realizations. From the simulation exercise, we find that the MI approach consistently performs better than the SI approach. Among various realizations, the MI approach with five realizations is selected based on our results. The MI approach with five realizations is compared with the CCD approach under different conditions using model fit measures and parameter marginal effects. In the presence of a small share of missing data, for larger datasets, the results suggest that it might be beneficial to develop a CCD model by dropping observations with missing values as opposed to developing imputation models. However, when the share of missing data warrants variable exclusion, it is important and even necessary that the MI approach be employed for model development. In the second part of the paper, based on our findings, we implemented the MI approach for real empirical datasets with missing values for four discrete outcome variables.