When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy

IF 1.3 4区计算机科学 Q2 STATISTICS & PROBABILITY

Advances in Data Analysis and Classification Pub Date : 2025-04-10 DOI:10.1007/s11634-025-00630-z

Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira

{"title":"When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy","authors":"Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira","doi":"10.1007/s11634-025-00630-z","DOIUrl":null,"url":null,"abstract":"<div><p>Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"515 - 543"},"PeriodicalIF":1.3000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00630-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-025-00630-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.

查看原文本刊更多论文

当非回应从人口普查中进行估计时，一个小区域估计问题：以意大利毕业生就业状况调查为例

自1998年以来，由80所意大利大学和意大利国家统计系统成员组成的联盟almalaurea每年对毕业生的就业状况进行一次普查。调查提供了人口一级和有关的特定亚人口（领域）（如学位课程）的描述性指标估计数。一些领域由于人口规模小和无反应而很少观察到。在本文中，我们在一个小区域估计框架中解决了这个估计问题。具体而言，我们建议使用广义线性混合模型，其中包含两个变量作为毕业生响应倾向的代理，使非信息无响应的假设更加合理。学位课程对就业率的估计是使用逻辑回归模型的有限混合得出的（半参数）经验最佳预测，其均方误差通过二阶偏倚校正的分析估计器估计。进行敏感性分析以评估模拟反应倾向的变量的解释能力，并评估区域特异性随机效应与观察到的异质性之间的潜在相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in Data Analysis and Classification STATISTICS & PROBABILITY-

CiteScore

3.40

自引率

6.20%

发文量

审稿时长

>12 weeks

期刊介绍： The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.