不完全有序和标称数据的预测均值匹配。

IF 1.9 3区医学 Q3 HEALTH CARE SCIENCES & SERVICES

Statistical Methods in Medical Research Pub Date : 2025-08-17 DOI:10.1177/09622802251362642

Peter C Austin, Stef van Buuren

{"title":"不完全有序和标称数据的预测均值匹配。","authors":"Peter C Austin, Stef van Buuren","doi":"10.1177/09622802251362642","DOIUrl":null,"url":null,"abstract":"Multivariate imputation using chained equations is a popular algorithm for imputing missing data that entails specifying multivariable models through conditional distributions. Two standard imputation methods for imputing missing continuous variables are parametric imputation using a linear model and predictive mean matching. The default methods for imputing missing categorical variables are parametric imputation using multinomial logistic regression and ordinal logistic regression for imputing nominal and ordinal categorical variables, respectively. There is a paucity of research into the relative computational burden and the quality of statistical inferences when using predictive mean matching versus parametric imputation for imputing missing non-binary categorical variables. We used simulations to compare the performance of predictive mean matching with that of multinomial logistic regression and ordinal logistic regression for imputing categorical variables when the analysis model of scientific interest was a logistic or linear regression model. We varied the sample size (N = 500, 1000, 2500, and 5000), the rate of missing data (5%-50% in increments of 5%), and the number of levels of the categorical variable (3, 4, 5, and 6). In general, the performance of predictive mean matching compared very favorably to that of multinomial or ordinal logistic regression for imputing categorical variables when the analysis model was a logistic or linear regression model. This was true across a range of scenarios defined by sample size and the rate of missing data. Furthermore, the use of predictive mean matching was substantially faster, by a factor of 2-6. In conclusion, predictive mean matching can be used to impute categorical variables. The use of predictive mean matching to impute missing non-binary categorical variables substantially reduces computer processing time when conducting multiple imputation.","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251362642"},"PeriodicalIF":1.9000,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Imputation of incomplete ordinal and nominal data by predictive mean matching.\",\"authors\":\"Peter C Austin, Stef van Buuren\",\"doi\":\"10.1177/09622802251362642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multivariate imputation using chained equations is a popular algorithm for imputing missing data that entails specifying multivariable models through conditional distributions. Two standard imputation methods for imputing missing continuous variables are parametric imputation using a linear model and predictive mean matching. The default methods for imputing missing categorical variables are parametric imputation using multinomial logistic regression and ordinal logistic regression for imputing nominal and ordinal categorical variables, respectively. There is a paucity of research into the relative computational burden and the quality of statistical inferences when using predictive mean matching versus parametric imputation for imputing missing non-binary categorical variables. We used simulations to compare the performance of predictive mean matching with that of multinomial logistic regression and ordinal logistic regression for imputing categorical variables when the analysis model of scientific interest was a logistic or linear regression model. We varied the sample size (N = 500, 1000, 2500, and 5000), the rate of missing data (5%-50% in increments of 5%), and the number of levels of the categorical variable (3, 4, 5, and 6). In general, the performance of predictive mean matching compared very favorably to that of multinomial or ordinal logistic regression for imputing categorical variables when the analysis model was a logistic or linear regression model. This was true across a range of scenarios defined by sample size and the rate of missing data. Furthermore, the use of predictive mean matching was substantially faster, by a factor of 2-6. In conclusion, predictive mean matching can be used to impute categorical variables. The use of predictive mean matching to impute missing non-binary categorical variables substantially reduces computer processing time when conducting multiple imputation.\",\"PeriodicalId\":22038,\"journal\":{\"name\":\"Statistical Methods in Medical Research\",\"volume\":\" \",\"pages\":\"9622802251362642\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Methods in Medical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/09622802251362642\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Methods in Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09622802251362642","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

使用链式方程的多变量输入是一种流行的输入缺失数据的算法，它需要通过条件分布指定多变量模型。缺失连续变量的两种标准输入方法是线性模型参数输入和预测均值匹配。缺失分类变量的默认输入方法是参数输入，分别使用多项逻辑回归和序数逻辑回归输入名义和序数分类变量。在使用预测均值匹配和参数代入来代入缺失的非二元分类变量时，缺乏对相对计算负担和统计推断质量的研究。当科学兴趣的分析模型是逻辑回归模型或线性回归模型时，我们使用模拟来比较预测均值匹配与多项逻辑回归和有序逻辑回归在输入分类变量方面的性能。我们改变了样本量（N = 500、1000、2500和5000）、缺失数据率（5%-50%，增量为5%）和分类变量的水平数（3,4,5和6）。一般来说，当分析模型为逻辑或线性回归模型时，预测均值匹配在输入分类变量方面的表现要优于多项或有序逻辑回归。在由样本量和数据丢失率定义的一系列场景中，这是正确的。此外，使用预测均值匹配的速度要快得多，达到2-6倍。综上所述，预测均值匹配可以用于估算分类变量。使用预测均值匹配来输入缺失的非二元分类变量，大大减少了进行多次输入时计算机的处理时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Imputation of incomplete ordinal and nominal data by predictive mean matching.

Multivariate imputation using chained equations is a popular algorithm for imputing missing data that entails specifying multivariable models through conditional distributions. Two standard imputation methods for imputing missing continuous variables are parametric imputation using a linear model and predictive mean matching. The default methods for imputing missing categorical variables are parametric imputation using multinomial logistic regression and ordinal logistic regression for imputing nominal and ordinal categorical variables, respectively. There is a paucity of research into the relative computational burden and the quality of statistical inferences when using predictive mean matching versus parametric imputation for imputing missing non-binary categorical variables. We used simulations to compare the performance of predictive mean matching with that of multinomial logistic regression and ordinal logistic regression for imputing categorical variables when the analysis model of scientific interest was a logistic or linear regression model. We varied the sample size (N = 500, 1000, 2500, and 5000), the rate of missing data (5%-50% in increments of 5%), and the number of levels of the categorical variable (3, 4, 5, and 6). In general, the performance of predictive mean matching compared very favorably to that of multinomial or ordinal logistic regression for imputing categorical variables when the analysis model was a logistic or linear regression model. This was true across a range of scenarios defined by sample size and the rate of missing data. Furthermore, the use of predictive mean matching was substantially faster, by a factor of 2-6. In conclusion, predictive mean matching can be used to impute categorical variables. The use of predictive mean matching to impute missing non-binary categorical variables substantially reduces computer processing time when conducting multiple imputation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Methods in Medical Research 医学-数学与计算生物学

CiteScore

4.10

自引率

4.30%

发文量

127

审稿时长

>12 weeks

期刊介绍： Statistical Methods in Medical Research is a peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and an essential reference for all medical statisticians. This unique journal is devoted solely to statistics and medicine and aims to keep professionals abreast of the many powerful statistical techniques now available to the medical profession. This journal is a member of the Committee on Publication Ethics (COPE)