{"title":"Calibrated imputation for multivariate categorical data","authors":"Ton de Waal, Jacco Daalmans","doi":"10.1007/s10182-023-00481-z","DOIUrl":null,"url":null,"abstract":"<div><p>Non-response is a major problem for anyone collecting and processing data. A commonly used technique to deal with missing data is imputation, where missing values are estimated and filled in into the dataset. Imputation can become challenging if the variable to be imputed has to comply with a known total. Even more challenging is the case where several variables in the same dataset need to be imputed and, in addition to known totals, logical restrictions between variables have to be satisfied. In our paper, we develop an approach for a broad class of imputation methods for multivariate categorical data such that previously published totals are preserved while logical restrictions on the data are satisfied. The developed approach can be used in combination with any imputation model that estimates imputation probabilities, i.e. the probability that imputation of a certain category for a variable in a certain unit leads to the correct value for this variable and unit.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00481-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asta-Advances in Statistical Analysis","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10182-023-00481-z","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Non-response is a major problem for anyone collecting and processing data. A commonly used technique to deal with missing data is imputation, where missing values are estimated and filled in into the dataset. Imputation can become challenging if the variable to be imputed has to comply with a known total. Even more challenging is the case where several variables in the same dataset need to be imputed and, in addition to known totals, logical restrictions between variables have to be satisfied. In our paper, we develop an approach for a broad class of imputation methods for multivariate categorical data such that previously published totals are preserved while logical restrictions on the data are satisfied. The developed approach can be used in combination with any imputation model that estimates imputation probabilities, i.e. the probability that imputation of a certain category for a variable in a certain unit leads to the correct value for this variable and unit.
期刊介绍:
AStA - Advances in Statistical Analysis, a journal of the German Statistical Society, is published quarterly and presents original contributions on statistical methods and applications and review articles.