A new technique for handling non-probability samples based on model-assisted kernel weighting

IF 5.4 3区 材料科学 Q2 CHEMISTRY, PHYSICAL
Beatriz Cobo , Jorge Luis Rueda-Sánchez , Ramón Ferri-García , María del Mar Rueda
{"title":"A new technique for handling non-probability samples based on model-assisted kernel weighting","authors":"Beatriz Cobo ,&nbsp;Jorge Luis Rueda-Sánchez ,&nbsp;Ramón Ferri-García ,&nbsp;María del Mar Rueda","doi":"10.1016/j.matcom.2024.08.009","DOIUrl":null,"url":null,"abstract":"<div><p>Surveys are going through massive changes, and the most important innovation is the use of non-probability samples. Non-probability samples are increasingly used for their low research costs and the speed of the attainment of results, but these surveys are expected to have strong selection bias caused by several mechanisms that can eventually lead to unreliable estimates of the population parameters of interest. Thus, the classical methods of statistical inference do not apply because the probabilities of inclusion in the sample for individual members of the population are not known. Therefore, in the last few decades, new possibilities of inference from non-probability sources have appeared.</p><p>Statistical theory offers different methods for addressing selection bias based on the availability of auxiliary information about other variables related to the main variable, which must have been measured in the non-probability sample. Two important approaches are inverse probability weighting and mass imputation. Other methods can be regarded as combinations of these two approaches.</p><p>This study proposes a new estimation technique for non-probability samples. We call this technique model-assisted kernel weighting, which is combined with some machine learning techniques. The proposed technique is evaluated in a simulation study using data from a population and drawing samples using designs with varying levels of complexity for, a study on the relative bias and mean squared error in this estimator under certain conditions. After analyzing the results, we see that the proposed estimator has the smallest value of both the relative bias and the mean squared error when considering different sample sizes, and in general, the kernel weighting methods reduced more bias compared to based on inverse weighting. We also studied the behavior of the estimators using different techniques such us generalized linear regression versus machine learning algorithms, but we have not been able to find a method that is the best in all cases. Finally, we study the influence of the density function used, triangular or standard normal functions, and conclude that they work similarly.</p><p>A case study involving a non-probability sample that took place during the COVID-19 lockdown was conducted to verify the real performance of the proposed methodology, obtain a better estimate, and control the value of the variance.</p></div>","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378475424003094/pdfft?md5=9a932b624680104d7b919b9b781b865a&pid=1-s2.0-S0378475424003094-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378475424003094","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Surveys are going through massive changes, and the most important innovation is the use of non-probability samples. Non-probability samples are increasingly used for their low research costs and the speed of the attainment of results, but these surveys are expected to have strong selection bias caused by several mechanisms that can eventually lead to unreliable estimates of the population parameters of interest. Thus, the classical methods of statistical inference do not apply because the probabilities of inclusion in the sample for individual members of the population are not known. Therefore, in the last few decades, new possibilities of inference from non-probability sources have appeared.

Statistical theory offers different methods for addressing selection bias based on the availability of auxiliary information about other variables related to the main variable, which must have been measured in the non-probability sample. Two important approaches are inverse probability weighting and mass imputation. Other methods can be regarded as combinations of these two approaches.

This study proposes a new estimation technique for non-probability samples. We call this technique model-assisted kernel weighting, which is combined with some machine learning techniques. The proposed technique is evaluated in a simulation study using data from a population and drawing samples using designs with varying levels of complexity for, a study on the relative bias and mean squared error in this estimator under certain conditions. After analyzing the results, we see that the proposed estimator has the smallest value of both the relative bias and the mean squared error when considering different sample sizes, and in general, the kernel weighting methods reduced more bias compared to based on inverse weighting. We also studied the behavior of the estimators using different techniques such us generalized linear regression versus machine learning algorithms, but we have not been able to find a method that is the best in all cases. Finally, we study the influence of the density function used, triangular or standard normal functions, and conclude that they work similarly.

A case study involving a non-probability sample that took place during the COVID-19 lockdown was conducted to verify the real performance of the proposed methodology, obtain a better estimate, and control the value of the variance.

基于模型辅助核加权的非概率样本处理新技术
调查正在经历巨大的变化,其中最重要的创新是使用非概率样本。非概率样本因其研究成本低、获得结果快而被越来越多地使用,但这些调查预计会因多种机制而产生强烈的选择偏差,最终导致对相关人口参数的估计不可靠。因此,经典的统计推断方法并不适用,因为不知道人口中个体成员被纳入样本的概率。因此,在过去的几十年中,出现了从非概率来源进行推断的新方法。统计理论提供了不同的方法来解决选择偏差问题,这些方法基于与主要变量相关的其他变量的辅助信息,而这些信息必须在非概率样本中进行测量。两种重要的方法是反概率加权法和大规模估算法。本研究针对非概率样本提出了一种新的估计技术。我们将这种技术称为模型辅助核加权,并将其与一些机器学习技术相结合。在一项模拟研究中,我们使用了来自人口的数据,并利用不同复杂程度的设计抽取样本,对所提出的技术进行了评估,研究了在特定条件下该估计器的相对偏差和均方误差。分析结果表明,在考虑不同样本量的情况下,所提出的估计器的相对偏差和均方误差值都是最小的,而且一般来说,与基于反向加权的估计器相比,核加权方法减少了更多的偏差。我们还研究了使用不同技术(如广义线性回归和机器学习算法)的估计器的行为,但我们未能找到一种在所有情况下都是最佳的方法。最后,我们研究了所使用的密度函数(三角函数或标准正态函数)的影响,得出的结论是它们的工作原理类似。我们进行了一项涉及 COVID-19 封锁期间发生的非概率样本的案例研究,以验证所提方法的实际性能,获得更好的估计值,并控制方差值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Energy Materials
ACS Applied Energy Materials Materials Science-Materials Chemistry
CiteScore
10.30
自引率
6.20%
发文量
1368
期刊介绍: ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信