{"title":"基于调查的微观模拟中样本代表性不足和偏态重尾分布的处理:瑞士强制性医疗保险中再分配效应的分析","authors":"Tobias Schoch, André Müller","doi":"10.1007/s11943-020-00275-8","DOIUrl":null,"url":null,"abstract":"<div><p> The credibility of microsimulation modeling with the research community and policymakers depends on high-quality baseline surveys. Quality problems with the baseline survey tend to impair the quality of microsimulation built on top of the survey data. We address two potential issues that both relate to skewed and heavy-tailed distributions.</p><p>First, we find that ultra-high-income households are under-represented in the baseline household survey. Moreover, the sample estimate of average income underestimates the known population average. Although the Deville–Särndal calibration method corrects the under-representation, it cannot achieve alignment of estimated average income in the right tail of the distribution with known population values without distorting the empirical income distribution. To overcome the problem, we introduce a Pareto tail model. With the help of the tail model, we can adjust the sample income distribution in the tail to meet the alignment targets. Our method can be a useful tool for microsimulation modelers working with survey income data.</p><p>The second contribution refers to the treatment of an outlier-prone variable that has been added to the survey by record linkage (our empirical example is health care cost). The nature of the baseline survey is not affected by record linkage, that is, the baseline survey still covers only a small part of the population. Hence, the sampling weights are relatively large. An outlying observation together with a high sampling weight can heavily influence or even ruin an estimate of a population characteristic. Thus, we argue that it is beneficial—in terms of mean square error—to use robust estimation and alignment methods, because robust methods are less affected by the presence of outliers.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"14 3-4","pages":"267 - 304"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s11943-020-00275-8","citationCount":"1","resultStr":"{\"title\":\"Treatment of sample under-representation and skewed heavy-tailed distributions in survey-based microsimulation: An analysis of redistribution effects in compulsory health care insurance in Switzerland\",\"authors\":\"Tobias Schoch, André Müller\",\"doi\":\"10.1007/s11943-020-00275-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p> The credibility of microsimulation modeling with the research community and policymakers depends on high-quality baseline surveys. Quality problems with the baseline survey tend to impair the quality of microsimulation built on top of the survey data. We address two potential issues that both relate to skewed and heavy-tailed distributions.</p><p>First, we find that ultra-high-income households are under-represented in the baseline household survey. Moreover, the sample estimate of average income underestimates the known population average. Although the Deville–Särndal calibration method corrects the under-representation, it cannot achieve alignment of estimated average income in the right tail of the distribution with known population values without distorting the empirical income distribution. To overcome the problem, we introduce a Pareto tail model. With the help of the tail model, we can adjust the sample income distribution in the tail to meet the alignment targets. Our method can be a useful tool for microsimulation modelers working with survey income data.</p><p>The second contribution refers to the treatment of an outlier-prone variable that has been added to the survey by record linkage (our empirical example is health care cost). The nature of the baseline survey is not affected by record linkage, that is, the baseline survey still covers only a small part of the population. Hence, the sampling weights are relatively large. An outlying observation together with a high sampling weight can heavily influence or even ruin an estimate of a population characteristic. Thus, we argue that it is beneficial—in terms of mean square error—to use robust estimation and alignment methods, because robust methods are less affected by the presence of outliers.</p></div>\",\"PeriodicalId\":100134,\"journal\":{\"name\":\"AStA Wirtschafts- und Sozialstatistisches Archiv\",\"volume\":\"14 3-4\",\"pages\":\"267 - 304\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s11943-020-00275-8\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AStA Wirtschafts- und Sozialstatistisches Archiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11943-020-00275-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AStA Wirtschafts- und Sozialstatistisches Archiv","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s11943-020-00275-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Treatment of sample under-representation and skewed heavy-tailed distributions in survey-based microsimulation: An analysis of redistribution effects in compulsory health care insurance in Switzerland
The credibility of microsimulation modeling with the research community and policymakers depends on high-quality baseline surveys. Quality problems with the baseline survey tend to impair the quality of microsimulation built on top of the survey data. We address two potential issues that both relate to skewed and heavy-tailed distributions.
First, we find that ultra-high-income households are under-represented in the baseline household survey. Moreover, the sample estimate of average income underestimates the known population average. Although the Deville–Särndal calibration method corrects the under-representation, it cannot achieve alignment of estimated average income in the right tail of the distribution with known population values without distorting the empirical income distribution. To overcome the problem, we introduce a Pareto tail model. With the help of the tail model, we can adjust the sample income distribution in the tail to meet the alignment targets. Our method can be a useful tool for microsimulation modelers working with survey income data.
The second contribution refers to the treatment of an outlier-prone variable that has been added to the survey by record linkage (our empirical example is health care cost). The nature of the baseline survey is not affected by record linkage, that is, the baseline survey still covers only a small part of the population. Hence, the sampling weights are relatively large. An outlying observation together with a high sampling weight can heavily influence or even ruin an estimate of a population characteristic. Thus, we argue that it is beneficial—in terms of mean square error—to use robust estimation and alignment methods, because robust methods are less affected by the presence of outliers.