Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling最新文献

Iterative Image Translation for Unsupervised Domain Adaptation 基于无监督域自适应的迭代图像翻译

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485050

S. Chhabra, Hemanth Venkateswara, Baoxin Li

引用次数: 2

Glocal Alignment for Unsupervised Domain Adaptation 无监督域自适应的全局局部对齐

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485051

S. Chhabra, Prabal Bijoy Dutta, Baoxin Li, Hemanth Venkateswara

引用次数: 4

Occlusion Contrasts for Self-Supervised Facial Age Estimation 自监督面部年龄估计的遮挡对比

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485052

Weiwei Cai, Hao Liu

{"title":"Occlusion Contrasts for Self-Supervised Facial Age Estimation","authors":"Weiwei Cai, Hao Liu","doi":"10.1145/3476098.3485052","DOIUrl":"https://doi.org/10.1145/3476098.3485052","url":null,"abstract":"In this paper, we propose an Occlusion Contrast(OCCO) approach for self-supervised facial partial occluded age estimation. Unlike the conventional facial age estimation approaches which utilize fully-visible faces as input data that does not generalize well for occlusion images, our approach aims to ignore the occlusion and only focus on the non-occluded facial areas so that we can improve the occluded facial age estimation accuracy. To achieve this, we utilize self-supervised contrastive learning to learn non-occluded feature representation, since contrastive learning makes the distances between the anchor and positive samples as close as possible in embedded space, while simultaneously pushing apart the negative samples. Furthermore, our OCCO incorporates with ordinal relationship of different ages, which is modeled by the deep label distribution learning. Considering that face aging datasets usually undergo a label imbalance problem, we employ the cost-sensitive strategy to constrain the learning of classifier. Extensive experimental results on two face aging datasets show that our OCCO not only achieve satisfactory performance over the masked faces but also comparable to the state-of-the-art age estimation methods for raw facial images.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Multimodal Data Labeling with Deep Active Learning for Post Classification in Social Networks 基于深度主动学习的社交网络Post分类多模态数据标注改进

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485055

Dmitry Krylov, S. Poliakov, N. Khanzhina, Alexey Zabashta, A. Filchenkov, Aleksandr Farseev

{"title":"Improving Multimodal Data Labeling with Deep Active Learning for Post Classification in Social Networks","authors":"Dmitry Krylov, S. Poliakov, N. Khanzhina, Alexey Zabashta, A. Filchenkov, Aleksandr Farseev","doi":"10.1145/3476098.3485055","DOIUrl":"https://doi.org/10.1145/3476098.3485055","url":null,"abstract":"Automatic user post classification is an important task in the field of social network analysis. Being effectively solved, post classification could be used for thematic user feed composition or inappropriate content identification. Commonly addressed by applying various Machine Learning approaches, the task often involves manual processes related to ground truth sourcing, which is known to be a hardly-scalable and increasingly expensive procedure. At the same time, Active Learning for automatic user post classification is a promising way to bridge such a gap, as it does not require massive ground truth availability aligning our research with the real world settings. In this work, we put our focus on leveraging textual and visual data modalities for the application of user post classification and investigate how batch size and batch normalization disabling techniques could affect active deep neural network learning process. We solve the problem of automatic user post classification by employing our novel multimodal neural network architecture with multi-head tunable loss function components. We show that the proposed approach, coupled with Active Learning, allows for the achievement of a significant classification performance boost in terms of crowd assessing resources as compared to the passive learning approaches.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117095196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-Branch Convolution Network for Few-Shot Classification 基于多分支卷积网络的少弹分类

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485053

Jie Hua, Xueliang Liu

{"title":"Multi-Branch Convolution Network for Few-Shot Classification","authors":"Jie Hua, Xueliang Liu","doi":"10.1145/3476098.3485053","DOIUrl":"https://doi.org/10.1145/3476098.3485053","url":null,"abstract":"Few-shot learning aims to complete the classification by only a small number of samples. In many few-shot learning frameworks, relation network is an end-to-end method, which can identify new categories through a small number of label samples based on metric learning. However, a simple feature extractor is used in this method, which limits the further improvement of the classification accuracy. To solve this problem, this paper proposes a multi-branch convolution network for feature extraction. This method combines the training strategies of multi-scale feature extraction, relation network, receptive field block and meta-learning. Firstly, the multi-scale feature vectors of the input image are extracted from the multi-branch convolution network. Then the feature vectors from the support set and the prediction set are input into the relation model, while the receptive field block is employed to improve the measurement ability of the network. Finally, the classification of the testing samples are realized according to the similarity score. In this paper, the effectiveness of the proposed model is verified on Omniglot and MiniImageNet datasets. The experimental results show that the classification accuracy of the proposed model is higher than that of other classical few-shot learning models.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126189280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incomplete Label Distribution Learning by Exploiting Global Sample Correlation 利用全局样本相关性的不完全标签分布学习

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-10-20 DOI: 10.1145/3476098.3485054

Qifa Teng, Xiuyi Jia

引用次数: 2

Contextual Image Parsing via Panoptic Segment Sorting 基于Panoptic分割排序的上下文图像解析

Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling Pub Date : 2021-05-04 DOI: 10.1145/3476098.3485056

Jyh-Jing Hwang, Tsung-Wei Ke, Stella X. Yu

{"title":"Contextual Image Parsing via Panoptic Segment Sorting","authors":"Jyh-Jing Hwang, Tsung-Wei Ke, Stella X. Yu","doi":"10.1145/3476098.3485056","DOIUrl":"https://doi.org/10.1145/3476098.3485056","url":null,"abstract":"Real-world visual recognition is far more complex than object recognition; there is stuff without distinctive shape or appearance, and the same object appearing in different contexts calls for different actions. While we need context-aware visual recognition, visual context is hard to describe and impossible to label manually. We consider visual context as semantic correlations between objects and their surroundings that include both object instances and stuff categories. We approach contextual object recognition as a pixel-wise feature representation learning problem that accomplishes supervised panoptic segmentation while discovering and encoding visual context automatically. Panoptic segmentation is a dense image parsing task that segments an image into regions with both semantic category and object instance labels. These two aspects could conflict each other, for two adjacent cars would have the same semantic label but different instance labels. Whereas most existing approaches handle the two labeling tasks separately and then fuse the results together, we propose a single pixel-wise feature learning approach that unifies both aspects of semantic segmentation and instance segmentation. Our work takes the metric learning perspective of SegSort but extends it non-trivially to panoptic segmentation, as we must merge segments into proper instances and handle instances of various scales. Our most exciting result is the emergence of visual context in the feature space through contrastive learning between pixels and segments, such that we can retrieve a person crossing a somewhat empty street without any such context labeling. Our experimental results on Cityscapes and PASCAL VOC demonstrate that, in terms of surround semantics distributions, our retrievals are much more consistent with the query than the state-of-the-art segmentation method, validating our pixel-wise representation learning approach for the unsupervised discovery and learning of visual context.","PeriodicalId":390904,"journal":{"name":"Multimedia Understanding with Less Labeling on Multimedia Understanding with Less Labeling","volume":"45 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120984953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2