解决半监督学习中的随机缺失问题：反概率加权法

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Stat Pub Date : 2024-06-23 DOI:10.1002/sta4.707

Jin Su, Shuyi Zhang, Yong Zhou

{"title":"解决半监督学习中的随机缺失问题：反概率加权法","authors":"Jin Su, Shuyi Zhang, Yong Zhou","doi":"10.1002/sta4.707","DOIUrl":null,"url":null,"abstract":"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"29 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method\",\"authors\":\"Jin Su, Shuyi Zhang, Yong Zhou\",\"doi\":\"10.1002/sta4.707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.\",\"PeriodicalId\":56159,\"journal\":{\"name\":\"Stat\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stat\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/sta4.707\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.707","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种在随机缺失（MAR）假设的半监督学习环境下的总体均值估计方法。在这种情况下，我们假设观测到的概率为，表示为，取决于样本总量，并满足。为了有效估计，我们引入了一种基于反概率加权和交叉拟合的自适应估计器。理论分析表明，我们提出的估计器具有一致性和高效性，收敛速度为，低于典型的收敛速度，这是由于在半监督设置中，随着样本量的增加，标记数据的比例会逐渐减少。我们还证明了反概率加权（IPW）-Nadaraya-Watson 密度函数估计器的一致性。大量的模拟和对洛杉矶无家可归者数据的应用验证了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method

We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Stat Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

1.10

自引率

0.00%

发文量

期刊介绍： Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell. Stat is characterised by: • Speed - a high-quality review process that aims to reach a decision within 20 days of submission. • Concision - a maximum article length of 10 pages of text, not including references. • Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images. • Scope - addresses all areas of statistics and interdisciplinary areas. Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.