{"title":"Iterative Image Translation for Unsupervised Domain Adaptation","authors":"S. Chhabra, Hemanth Venkateswara, Baoxin Li","doi":"10.1145/3476098.3485050","DOIUrl":"https://doi.org/10.1145/3476098.3485050","url":null,"abstract":"In this paper, we propose an image-translation-based unsupervised domain adaptation approach that iteratively trains an image translation and a classification network using each other. In Phase A, a classification network is used to guide the image translation to preserve the content and generate images. In Phase B, the generated images are used to train the classification network. With each step, the classification network and generator improve each other to learn the target domain representation. Detailed analysis and the experiments are testimony of the strength of our approach.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chhabra, Prabal Bijoy Dutta, Baoxin Li, Hemanth Venkateswara
{"title":"Glocal Alignment for Unsupervised Domain Adaptation","authors":"S. Chhabra, Prabal Bijoy Dutta, Baoxin Li, Hemanth Venkateswara","doi":"10.1145/3476098.3485051","DOIUrl":"https://doi.org/10.1145/3476098.3485051","url":null,"abstract":"Traditional unsupervised domain adaptation methods attempt to align source and target domains globally and are agnostic to the categories of the data points. This results in an inaccurate categorical alignment and diminishes the classification performance on the target domain. In this paper, we alter existing adversarial domain alignment methods to adhere to category alignment by imputing category information. We partition the samples based on category using source labels and target pseudo labels and then apply domain alignment for every category. Our proposed modification provides a boost in performance even with a modest pseudo label estimator. We evaluate our approach on 4 popular domain alignment loss functions using object recognition and digit datasets.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Occlusion Contrasts for Self-Supervised Facial Age Estimation","authors":"Weiwei Cai, Hao Liu","doi":"10.1145/3476098.3485052","DOIUrl":"https://doi.org/10.1145/3476098.3485052","url":null,"abstract":"In this paper, we propose an Occlusion Contrast(OCCO) approach for self-supervised facial partial occluded age estimation. Unlike the conventional facial age estimation approaches which utilize fully-visible faces as input data that does not generalize well for occlusion images, our approach aims to ignore the occlusion and only focus on the non-occluded facial areas so that we can improve the occluded facial age estimation accuracy. To achieve this, we utilize self-supervised contrastive learning to learn non-occluded feature representation, since contrastive learning makes the distances between the anchor and positive samples as close as possible in embedded space, while simultaneously pushing apart the negative samples. Furthermore, our OCCO incorporates with ordinal relationship of different ages, which is modeled by the deep label distribution learning. Considering that face aging datasets usually undergo a label imbalance problem, we employ the cost-sensitive strategy to constrain the learning of classifier. Extensive experimental results on two face aging datasets show that our OCCO not only achieve satisfactory performance over the masked faces but also comparable to the state-of-the-art age estimation methods for raw facial images.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitry Krylov, S. Poliakov, N. Khanzhina, Alexey Zabashta, A. Filchenkov, Aleksandr Farseev
{"title":"Improving Multimodal Data Labeling with Deep Active Learning for Post Classification in Social Networks","authors":"Dmitry Krylov, S. Poliakov, N. Khanzhina, Alexey Zabashta, A. Filchenkov, Aleksandr Farseev","doi":"10.1145/3476098.3485055","DOIUrl":"https://doi.org/10.1145/3476098.3485055","url":null,"abstract":"Automatic user post classification is an important task in the field of social network analysis. Being effectively solved, post classification could be used for thematic user feed composition or inappropriate content identification. Commonly addressed by applying various Machine Learning approaches, the task often involves manual processes related to ground truth sourcing, which is known to be a hardly-scalable and increasingly expensive procedure. At the same time, Active Learning for automatic user post classification is a promising way to bridge such a gap, as it does not require massive ground truth availability aligning our research with the real world settings. In this work, we put our focus on leveraging textual and visual data modalities for the application of user post classification and investigate how batch size and batch normalization disabling techniques could affect active deep neural network learning process. We solve the problem of automatic user post classification by employing our novel multimodal neural network architecture with multi-head tunable loss function components. We show that the proposed approach, coupled with Active Learning, allows for the achievement of a significant classification performance boost in terms of crowd assessing resources as compared to the passive learning approaches.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117095196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Branch Convolution Network for Few-Shot Classification","authors":"Jie Hua, Xueliang Liu","doi":"10.1145/3476098.3485053","DOIUrl":"https://doi.org/10.1145/3476098.3485053","url":null,"abstract":"Few-shot learning aims to complete the classification by only a small number of samples. In many few-shot learning frameworks, relation network is an end-to-end method, which can identify new categories through a small number of label samples based on metric learning. However, a simple feature extractor is used in this method, which limits the further improvement of the classification accuracy. To solve this problem, this paper proposes a multi-branch convolution network for feature extraction. This method combines the training strategies of multi-scale feature extraction, relation network, receptive field block and meta-learning. Firstly, the multi-scale feature vectors of the input image are extracted from the multi-branch convolution network. Then the feature vectors from the support set and the prediction set are input into the relation model, while the receptive field block is employed to improve the measurement ability of the network. Finally, the classification of the testing samples are realized according to the similarity score. In this paper, the effectiveness of the proposed model is verified on Omniglot and MiniImageNet datasets. The experimental results show that the classification accuracy of the proposed model is higher than that of other classical few-shot learning models.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126189280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incomplete Label Distribution Learning by Exploiting Global Sample Correlation","authors":"Qifa Teng, Xiuyi Jia","doi":"10.1145/3476098.3485054","DOIUrl":"https://doi.org/10.1145/3476098.3485054","url":null,"abstract":"In recent years, label distribution learning (LDL) has become a new learning paradigm in the field of machine learning. LDL is mainly designed to solve the problem of ambiguity among labels. Although LDL has been successful in many applications, most of these efforts are centered around complete supervised information. However, in reality, the supervised information is often incomplete due to the huge cost of label annotation. To address this problem, this paper proposes a novel incomplete LDL approach by utilizing the global sample correlation (IncomLDL-GSC). The label correlation is also considered to improve the performance of the model. Extensive experiments are conducted on 13 data sets to demonstrate the effectiveness of our proposed method.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual Image Parsing via Panoptic Segment Sorting","authors":"Jyh-Jing Hwang, Tsung-Wei Ke, Stella X. Yu","doi":"10.1145/3476098.3485056","DOIUrl":"https://doi.org/10.1145/3476098.3485056","url":null,"abstract":"Real-world visual recognition is far more complex than object recognition; there is stuff without distinctive shape or appearance, and the same object appearing in different contexts calls for different actions. While we need context-aware visual recognition, visual context is hard to describe and impossible to label manually. We consider visual context as semantic correlations between objects and their surroundings that include both object instances and stuff categories. We approach contextual object recognition as a pixel-wise feature representation learning problem that accomplishes supervised panoptic segmentation while discovering and encoding visual context automatically. Panoptic segmentation is a dense image parsing task that segments an image into regions with both semantic category and object instance labels. These two aspects could conflict each other, for two adjacent cars would have the same semantic label but different instance labels. Whereas most existing approaches handle the two labeling tasks separately and then fuse the results together, we propose a single pixel-wise feature learning approach that unifies both aspects of semantic segmentation and instance segmentation. Our work takes the metric learning perspective of SegSort but extends it non-trivially to panoptic segmentation, as we must merge segments into proper instances and handle instances of various scales. Our most exciting result is the emergence of visual context in the feature space through contrastive learning between pixels and segments, such that we can retrieve a person crossing a somewhat empty street without any such context labeling. Our experimental results on Cityscapes and PASCAL VOC demonstrate that, in terms of surround semantics distributions, our retrievals are much more consistent with the query than the state-of-the-art segmentation method, validating our pixel-wise representation learning approach for the unsupervised discovery and learning of visual context.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"45 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120984953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}