{"title":"Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method","authors":"Jin Su, Shuyi Zhang, Yong Zhou","doi":"10.1002/sta4.707","DOIUrl":null,"url":null,"abstract":"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.