Beatriz Cobo , Jorge Luis Rueda-Sánchez , Ramón Ferri-García , María del Mar Rueda
{"title":"A new technique for handling non-probability samples based on model-assisted kernel weighting","authors":"Beatriz Cobo , Jorge Luis Rueda-Sánchez , Ramón Ferri-García , María del Mar Rueda","doi":"10.1016/j.matcom.2024.08.009","DOIUrl":null,"url":null,"abstract":"<div><p>Surveys are going through massive changes, and the most important innovation is the use of non-probability samples. Non-probability samples are increasingly used for their low research costs and the speed of the attainment of results, but these surveys are expected to have strong selection bias caused by several mechanisms that can eventually lead to unreliable estimates of the population parameters of interest. Thus, the classical methods of statistical inference do not apply because the probabilities of inclusion in the sample for individual members of the population are not known. Therefore, in the last few decades, new possibilities of inference from non-probability sources have appeared.</p><p>Statistical theory offers different methods for addressing selection bias based on the availability of auxiliary information about other variables related to the main variable, which must have been measured in the non-probability sample. Two important approaches are inverse probability weighting and mass imputation. Other methods can be regarded as combinations of these two approaches.</p><p>This study proposes a new estimation technique for non-probability samples. We call this technique model-assisted kernel weighting, which is combined with some machine learning techniques. The proposed technique is evaluated in a simulation study using data from a population and drawing samples using designs with varying levels of complexity for, a study on the relative bias and mean squared error in this estimator under certain conditions. After analyzing the results, we see that the proposed estimator has the smallest value of both the relative bias and the mean squared error when considering different sample sizes, and in general, the kernel weighting methods reduced more bias compared to based on inverse weighting. We also studied the behavior of the estimators using different techniques such us generalized linear regression versus machine learning algorithms, but we have not been able to find a method that is the best in all cases. Finally, we study the influence of the density function used, triangular or standard normal functions, and conclude that they work similarly.</p><p>A case study involving a non-probability sample that took place during the COVID-19 lockdown was conducted to verify the real performance of the proposed methodology, obtain a better estimate, and control the value of the variance.</p></div>","PeriodicalId":49856,"journal":{"name":"Mathematics and Computers in Simulation","volume":"227 ","pages":"Pages 272-281"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378475424003094/pdfft?md5=9a932b624680104d7b919b9b781b865a&pid=1-s2.0-S0378475424003094-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics and Computers in Simulation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378475424003094","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Surveys are going through massive changes, and the most important innovation is the use of non-probability samples. Non-probability samples are increasingly used for their low research costs and the speed of the attainment of results, but these surveys are expected to have strong selection bias caused by several mechanisms that can eventually lead to unreliable estimates of the population parameters of interest. Thus, the classical methods of statistical inference do not apply because the probabilities of inclusion in the sample for individual members of the population are not known. Therefore, in the last few decades, new possibilities of inference from non-probability sources have appeared.
Statistical theory offers different methods for addressing selection bias based on the availability of auxiliary information about other variables related to the main variable, which must have been measured in the non-probability sample. Two important approaches are inverse probability weighting and mass imputation. Other methods can be regarded as combinations of these two approaches.
This study proposes a new estimation technique for non-probability samples. We call this technique model-assisted kernel weighting, which is combined with some machine learning techniques. The proposed technique is evaluated in a simulation study using data from a population and drawing samples using designs with varying levels of complexity for, a study on the relative bias and mean squared error in this estimator under certain conditions. After analyzing the results, we see that the proposed estimator has the smallest value of both the relative bias and the mean squared error when considering different sample sizes, and in general, the kernel weighting methods reduced more bias compared to based on inverse weighting. We also studied the behavior of the estimators using different techniques such us generalized linear regression versus machine learning algorithms, but we have not been able to find a method that is the best in all cases. Finally, we study the influence of the density function used, triangular or standard normal functions, and conclude that they work similarly.
A case study involving a non-probability sample that took place during the COVID-19 lockdown was conducted to verify the real performance of the proposed methodology, obtain a better estimate, and control the value of the variance.
期刊介绍:
The aim of the journal is to provide an international forum for the dissemination of up-to-date information in the fields of the mathematics and computers, in particular (but not exclusively) as they apply to the dynamics of systems, their simulation and scientific computation in general. Published material ranges from short, concise research papers to more general tutorial articles.
Mathematics and Computers in Simulation, published monthly, is the official organ of IMACS, the International Association for Mathematics and Computers in Simulation (Formerly AICA). This Association, founded in 1955 and legally incorporated in 1956 is a member of FIACC (the Five International Associations Coordinating Committee), together with IFIP, IFAV, IFORS and IMEKO.
Topics covered by the journal include mathematical tools in:
•The foundations of systems modelling
•Numerical analysis and the development of algorithms for simulation
They also include considerations about computer hardware for simulation and about special software and compilers.
The journal also publishes articles concerned with specific applications of modelling and simulation in science and engineering, with relevant applied mathematics, the general philosophy of systems simulation, and their impact on disciplinary and interdisciplinary research.
The journal includes a Book Review section -- and a "News on IMACS" section that contains a Calendar of future Conferences/Events and other information about the Association.