当辅助信息可用时，改进统计匹配

IF 1.6 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology Pub Date : 2023-02-13 DOI:10.1093/jssam/smac038

Angelo Moretti, N. Shlomo

{"title":"当辅助信息可用时，改进统计匹配","authors":"Angelo Moretti, N. Shlomo","doi":"10.1093/jssam/smac038","DOIUrl":null,"url":null,"abstract":"\n There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving Statistical Matching when Auxiliary Information is Available\",\"authors\":\"Angelo Moretti, N. Shlomo\",\"doi\":\"10.1093/jssam/smac038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.\",\"PeriodicalId\":17146,\"journal\":{\"name\":\"Journal of Survey Statistics and Methodology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Survey Statistics and Methodology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/jssam/smac038\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Survey Statistics and Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jssam/smac038","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 2

摘要

在国家统计研究所内部，人们越来越有兴趣将包含各种社会领域信息的现有数据集结合起来。统计匹配方法可用于通过一组公共变量集成数据源，其中每个数据集包含属于相同目标人群的不同单元。然而，一个常见的问题与在不同数据源中观察到的变量之间的条件独立性假设有关。在这种情况下，可以使用一个包含所有变量的辅助数据集，通过提供在不同数据集上观察到的变量的相关结构信息来改进统计匹配。我们提出通过校准步骤修改辅助数据集的预测模型，并表明我们可以改善各种设置下的统计匹配结果。我们通过模拟和基于欧盟收入和生活条件统计以及英国生活成本和食品调查的应用程序来评估拟议的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Statistical Matching when Auxiliary Information is Available

There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Survey Statistics and Methodology Multiple-

CiteScore

4.30

自引率

9.50%

发文量

期刊介绍： The Journal of Survey Statistics and Methodology, sponsored by AAPOR and the American Statistical Association, began publishing in 2013. Its objective is to publish cutting edge scholarly articles on statistical and methodological issues for sample surveys, censuses, administrative record systems, and other related data. It aims to be the flagship journal for research on survey statistics and methodology. Topics of interest include survey sample design, statistical inference, nonresponse, measurement error, the effects of modes of data collection, paradata and responsive survey design, combining data from multiple sources, record linkage, disclosure limitation, and other issues in survey statistics and methodology. The journal publishes both theoretical and applied papers, provided the theory is motivated by an important applied problem and the applied papers report on research that contributes generalizable knowledge to the field. Review papers are also welcomed. Papers on a broad range of surveys are encouraged, including (but not limited to) surveys concerning business, economics, marketing research, social science, environment, epidemiology, biostatistics and official statistics. The journal has three sections. The Survey Statistics section presents papers on innovative sampling procedures, imputation, weighting, measures of uncertainty, small area inference, new methods of analysis, and other statistical issues related to surveys. The Survey Methodology section presents papers that focus on methodological research, including methodological experiments, methods of data collection and use of paradata. The Applications section contains papers involving innovative applications of methods and providing practical contributions and guidance, and/or significant new findings.