{"title":"On the Asymptotic Sample Complexity of HGR Maximal Correlation Functions in Semi-supervised Learning","authors":"Xiangxiang Xu, Shao-Lun Huang","doi":"10.1109/ALLERTON.2019.8919892","DOIUrl":null,"url":null,"abstract":"The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2019.8919892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.