{"title":"BALD-VAE: Generative Active Learning based on the Uncertainties of Both Labeled and Unlabeled Data","authors":"Sun-Kyung Lee, Jong-Hwan Kim","doi":"10.1109/RITAPP.2019.8932813","DOIUrl":null,"url":null,"abstract":"Deep learning has shown outstanding performances on real world problems, but acquiring sufficient labeled data to train a model is still an on-going issue. Specifically, manually labeling data is time-consuming and costly. One approach to tackle this issue is active learning. Recently, pool-based methods and generative methods are widely studied among various approaches of active learning. Especially in the uncertainty pool-based methods, a small labeled data set and a large unlabeled data set are given. A model is trained on the labeled data set and then observes the unlabeled data set. The trained model ranks the unlabeled data in order of uncertainty to select the data which has the highest uncertainty. In the generative methods, a generative model is used to generate informative samples. In the previous studies of the uncertainty pool-based active learning, however, the uncertainty of labeled data was not considered. Thus, we propose a new Bayesian active learning by disagreement with variational autoencoder (BALD-VAE), which considers the uncertainty of labeled data when generating informative samples. Basically following the uncertainty pool-based active learning with BALD, the pro-posed algorithm also utilizes the concept of generative active learning to generate informative data using VAE. Then, the generated data complement the highly uncertain labeled data. To demonstrate the effectiveness, the proposed method is tested on MNIST and CIFAR10 data sets and shown to outperform the previous algorithms.","PeriodicalId":234023,"journal":{"name":"2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RITAPP.2019.8932813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Deep learning has shown outstanding performances on real world problems, but acquiring sufficient labeled data to train a model is still an on-going issue. Specifically, manually labeling data is time-consuming and costly. One approach to tackle this issue is active learning. Recently, pool-based methods and generative methods are widely studied among various approaches of active learning. Especially in the uncertainty pool-based methods, a small labeled data set and a large unlabeled data set are given. A model is trained on the labeled data set and then observes the unlabeled data set. The trained model ranks the unlabeled data in order of uncertainty to select the data which has the highest uncertainty. In the generative methods, a generative model is used to generate informative samples. In the previous studies of the uncertainty pool-based active learning, however, the uncertainty of labeled data was not considered. Thus, we propose a new Bayesian active learning by disagreement with variational autoencoder (BALD-VAE), which considers the uncertainty of labeled data when generating informative samples. Basically following the uncertainty pool-based active learning with BALD, the pro-posed algorithm also utilizes the concept of generative active learning to generate informative data using VAE. Then, the generated data complement the highly uncertain labeled data. To demonstrate the effectiveness, the proposed method is tested on MNIST and CIFAR10 data sets and shown to outperform the previous algorithms.
深度学习在现实世界的问题上表现出色,但获取足够的标注数据来训练模型仍是一个持续存在的问题。具体来说,手动标注数据既费时又费钱。解决这一问题的方法之一是主动学习。最近,在各种主动学习方法中,基于池的方法和生成方法被广泛研究。特别是在基于不确定性池的方法中,需要给出一个小的标注数据集和一个大的未标注数据集。在标注数据集上训练一个模型,然后观察未标注数据集。训练好的模型按照不确定性的大小对未标注数据进行排序,以选择不确定性最大的数据。在生成方法中,生成模型用于生成信息样本。然而,在以往基于不确定性池的主动学习研究中,并未考虑标记数据的不确定性。因此,我们提出了一种新的贝叶斯主动学习方法(Bayesian active learning by disagreement with variational autoencoder,BALD-VAE),它在生成信息样本时考虑了标注数据的不确定性。该算法基本上沿用了基于不确定性池的贝叶斯主动学习,同时还利用了生成式主动学习的概念,使用 VAE 生成信息数据。然后,生成的数据对高度不确定的标记数据进行补充。为了证明该方法的有效性,我们在 MNIST 和 CIFAR10 数据集上对其进行了测试,结果表明该方法优于之前的算法。