利用大数据进行小面积估算的校准数据驱动方法

Pub Date : 2024-05-14 DOI:10.1111/anzs.12414
Siu-Ming Tam, Shaila Sharmeen
{"title":"利用大数据进行小面积估算的校准数据驱动方法","authors":"Siu-Ming Tam,&nbsp;Shaila Sharmeen","doi":"10.1111/anzs.12414","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an <i>k</i>-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-<i>k</i> asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.</p>\n </div>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A calibrated data-driven approach for small area estimation using big data\",\"authors\":\"Siu-Ming Tam,&nbsp;Shaila Sharmeen\",\"doi\":\"10.1111/anzs.12414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an <i>k</i>-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-<i>k</i> asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.</p>\\n </div>\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/anzs.12414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/anzs.12414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摘要当大数据集中的响应变量与小地区估算中的相关变量一致时,大数据本身就可以提供小地区的估算值。这些估算值通常会受到大数据的覆盖范围和测量误差偏差的影响。不过,如果有对相同相关变量的概率调查,则可将调查数据用作训练数据集,以开发算法来估算大数据遗漏的数据并调整测量误差。在本文中,我们概述了一种基于 k 近邻(kNN)算法的此类估算方法,该算法被校准为对全国总量的渐近设计无偏估计,并说明了如何使用训练数据集来估算估算偏差,以及如何使用 "固定-k 渐近 "自举法来估算小范围混合估算器的方差。我们使用一个公共使用数据集来说明本文的方法,并用它来比较我们的混合估算器与费-哈里奥特(FH)估算器的准确性和精确度。最后,我们还从数值上检验了当连接模型中使用的辅助变量受到覆盖不足误差影响时 FH 估算器的准确性和精确度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享
查看原文
A calibrated data-driven approach for small area estimation using big data

Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an k-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-k asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信