Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献

筛选
英文 中文
Hierarchical clustering via mutual learning for unsupervised person re-identification 基于互学习的无监督人再识别层次聚类
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446268
Xu Xu, Liyan Zhang, Zhaomeng Huang, Guodong Du
{"title":"Hierarchical clustering via mutual learning for unsupervised person re-identification","authors":"Xu Xu, Liyan Zhang, Zhaomeng Huang, Guodong Du","doi":"10.1145/3444685.3446268","DOIUrl":"https://doi.org/10.1145/3444685.3446268","url":null,"abstract":"Person re-identification (re-ID) aims to establish identity correspondence across different cameras. State-of-the-art re-ID approaches are mainly clustering-based Unsupervised Domain Adaptation (UDA) methods, which attempt to transfer the model trained on the source domain to target domain, by alternatively generating pseudo labels by clustering target-domain instances and training the network with generated pseudo labels to perform feature learning. However, these approaches suffer from the problem of inevitable label noise caused by the clustering procedure that dramatically impact the model training and feature learning of the target domain. To address this issue, we propose an unsupervised Hierarchical Clustering via Mutual Learning (HCML) framework, which can jointly optimize the dual training network and the clustering procedure to learn more discriminative features from the target domain. Specifically, the proposed HCML framework can effectively update the hard pseudo labels generated by clustering process and soft pseudo label generated by the training network both in on-line manner. We jointly adopt the repelled loss, triplet loss, soft identity loss and soft triplet loss to optimize the model. The experimental results on Market-to-Duke, Duke-to-Market, Market-to-MSMT and Duke-to-MSMT unsupervised domain adaptation tasks have demonstrated the superiority of our proposed HCML framework compared with other state-of-the-art methods.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122002867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining image age with rank-consistent ordinal classification and object-centered ensemble 用秩一致有序分类和目标中心集成确定图像年龄
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446326
Shota Ashida, A. Jatowt, A. Doucet, Masatoshi Yoshikawa
{"title":"Determining image age with rank-consistent ordinal classification and object-centered ensemble","authors":"Shota Ashida, A. Jatowt, A. Doucet, Masatoshi Yoshikawa","doi":"10.1145/3444685.3446326","DOIUrl":"https://doi.org/10.1145/3444685.3446326","url":null,"abstract":"A significant number of old photographs including ones that are posted online do not contain the information of the date at which they were taken, or this information needs to be verified. Many of such pictures are either scanned analog photographs or photographs taken using a digital camera with incorrect settings. Estimating the date of such pictures is useful for enhancing data quality and its consistency, improving information retrieval and for other related applications. In this study, we propose a novel approach for automatic estimation of the shooting dates of photographs based on a rank-consistent ordinal classification method for neural networks. We also introduce an ensemble approach that involves object segmentation. We conclude that assuring the rank consistency in the ordinal classification as well as combining models trained on segmented objects improve the results of the age determination task.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131526715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-level expression guided attention network for referring expression comprehension 多层次表达引导注意网络对表达理解的参考作用
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446270
Liang Peng, Yang Yang, Xing Xu, Jingjing Li, Xiaofeng Zhu
{"title":"Multi-level expression guided attention network for referring expression comprehension","authors":"Liang Peng, Yang Yang, Xing Xu, Jingjing Li, Xiaofeng Zhu","doi":"10.1145/3444685.3446270","DOIUrl":"https://doi.org/10.1145/3444685.3446270","url":null,"abstract":"Referring expression comprehension is a task of identifying a text-related object or region in a given image by a natural language expression. In this task, it is essential to understand the expression sentence in multi-aspect and adapt it to region representations for generating the discriminative information. Unfortunately, previous approaches usually focus on the important words or phrases in the expression using self-attention mechanisms, which causes that they may fail to distinguish the target region from others, especially the similar regions. To address this problem, we propose a novel model, termed Multi-level Expression Guided Attention network (MEGA-Net). It contains a multi-level visual attention schema guided by the expression representations in different levels, i.e., sentence-level, word-level and phrase-level, which allows generating the discriminative region features and helps to locate the related regions accurately. In addition, to distinguish the similar regions, we design a two-stage structure, where we first select top-K candidate regions according to their matching scores in the first stage, then we apply an object comparison attention mechanism to learn the difference between the candidates for matching the target region. We evaluate the proposed approach on three popular benchmark datasets and the experimental results demonstrate that our model performs against state-of-the-art methods.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128484854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Relationship graph learning network for visual relationship detection 用于视觉关系检测的关系图学习网络
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446312
Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen
{"title":"Relationship graph learning network for visual relationship detection","authors":"Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen","doi":"10.1145/3444685.3446312","DOIUrl":"https://doi.org/10.1145/3444685.3446312","url":null,"abstract":"Visual relationship detection aims to predict the relationships between detected object pairs. It is well believed that the correlations between image components (i.e., objects and relationships between objects) are significant considerations when predicting objects' relationships. However, most current visual relationship detection methods only exploited the correlations among objects, and the correlations among objects' relationships remained underexplored. This paper proposes a relationship graph learning network (RGLN) to explore the correlations among objects' relationships for visual relationship detection. Specifically, RGLN obtains image objects using an object detector, and then, every pair of objects constitutes a relationship proposal. All relationship proposals construct a relationship graph, in which the proposals are treated as nodes. Accordingly, RGLN designs bi-stream graph attention subnetworks to detect relationship proposals, in which one graph attention subnetwork analyzes correlations among relationships based on visual and spatial information, and the other analyzes correlations based on semantic and spatial information. Besides, RGLN exploits a relationship selection subnetwork to ignore redundant information of object pairs with no relationships. We conduct extensive experiments on two public datasets: the VRD and the VG datasets. The experimental results compared with the state-of-the-art demonstrate the competitiveness of RGLN.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133276438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Table detection and cell segmentation in online handwritten documents with graph attention networks 基于图关注网络的在线手写文档表检测与单元分割
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446295
Ying-Jian Liu, Heng Zhang, Xiao-Long Yun, Jun-Yu Ye, Cheng-Lin Liu
{"title":"Table detection and cell segmentation in online handwritten documents with graph attention networks","authors":"Ying-Jian Liu, Heng Zhang, Xiao-Long Yun, Jun-Yu Ye, Cheng-Lin Liu","doi":"10.1145/3444685.3446295","DOIUrl":"https://doi.org/10.1145/3444685.3446295","url":null,"abstract":"In this paper, we propose a multi-task learning approach for table detection and cell segmentation with densely connected graph attention networks in free form online documents. Each online document is regarded as a graph, where nodes represent strokes and edges represent the relationships between strokes. Then we propose a graph attention network model to classify nodes and edges simultaneously. According to node classification results, tables can be detected in each document. By combining node and edge classification resutls, cells in each table can be segmented. To improve information flow in the network and enable efficient reuse of features among layers, dense connectivity among layers is used. Our proposed model has been experimentally validated on an online handwritten document dataset IAMOnDo and achieved encouraging results.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133941461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Storyboard relational model for group activity recognition 小组活动识别的故事板关系模型
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446255
Boning Li, Xiangbo Shu, Rui Yan
{"title":"Storyboard relational model for group activity recognition","authors":"Boning Li, Xiangbo Shu, Rui Yan","doi":"10.1145/3444685.3446255","DOIUrl":"https://doi.org/10.1145/3444685.3446255","url":null,"abstract":"This work concerns how to effectively recognize the group activity performed by multiple persons collectively. As known, Storyboards (i.e., medium shot, close shot) jointly describe the whole storyline of a movie in a compact way. Likewise, the actors in small subgroups (similar to Storyboards) of a group activity scene contribute a lot to such group activity and develop more compact relationships among them within subgroups. Inspired by this, we propose a Storyboard Relational Model (SRM) to address the problem of Group Activity Recognition by splitting and reintegrating the group activity based on the small yet compact Storyboards. SRM mainly consists of a Pose-Guided Pruning (PGP) module and a Dual Graph Convolutional Networks (Dual-GCN) module. Specifically, PGP is designed to refine a series of Storyboards from the group activity scene by leveraging the attention ranges of individuals. Dual-GCN models the compact relationships among actors in a Storyboard. Experimental results on two widely-used datasets illustrate the effectiveness of the proposed SRM compared with the state-of-the-art methods.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113977758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Objective object segmentation visual quality evaluation based on pixel-level and region-level characteristics 基于像素级和区域级特征的目标分割视觉质量评价
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446305
Ran Shi, Jian Xiong, T. Qiao
{"title":"Objective object segmentation visual quality evaluation based on pixel-level and region-level characteristics","authors":"Ran Shi, Jian Xiong, T. Qiao","doi":"10.1145/3444685.3446305","DOIUrl":"https://doi.org/10.1145/3444685.3446305","url":null,"abstract":"Objective object segmentation visual quality evaluation is an emergent member of the visual quality assessment family. It aims at developing an objective measure instead of a subjective survey to evaluate the object segmentation quality in agreement with human visual perception. It is an important benchmark to assess and compare performances of object segmentation methods in terms of the visual quality. In spite of its essential role, it still lacks of sufficient studying compared with other visual quality evaluation researches. In this paper, we propose a novel full-reference objective measure including a pixel-level sub-measure and a region-level sub-measure. For the pixel-level sub-measure, it assigns proper weights to not only false positive pixels and false negative pixels but also true positive pixels according to their certainty degrees. For the region-level sub-measure, it considers location distribution of the false negative errors and correlations among neighboring pixels. Thus, by combining these two sub-measures, our measure can evaluate similarity of area, shape and object completeness between one segmentation result and its ground truth in terms of human visual perception. In order to evaluate the performance of our proposed measure, we tested it on an object segmentation subjective visual quality assessment database. The experimental results demonstrate that our proposed measure with good robustness performs better in matching subjective assessments compared with other state-of-the-art objective measures.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114045525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fusing CAMs-weighted features and temporal information for robust loop closure detection 融合cam加权特征和时间信息的鲁棒闭环检测
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446309
Yao Li, S. Zhong, Tongwei Ren, Y. Liu
{"title":"Fusing CAMs-weighted features and temporal information for robust loop closure detection","authors":"Yao Li, S. Zhong, Tongwei Ren, Y. Liu","doi":"10.1145/3444685.3446309","DOIUrl":"https://doi.org/10.1145/3444685.3446309","url":null,"abstract":"As a key component in simultaneous localization and mapping (SLAM) system, loop closure detection (LCD) eliminates the accumulated errors by recognizing previously visited places. In recent years, deep learning methods have been proved effective in LCD. However, most of the existing methods do not make good use of the useful information provided by monocular images, which tends to limit their performance in challenging dynamic scenarios with partial occlusion by moving objects. To this end, we propose a novel workflow, which is able to combine multiple information provided by images. We first introduce semantic information into LCD by developing a local-aware Class Activation Maps (CAMs) weighting method for extracting features, which can reduce the adverse effects of moving objects. Compared with previous methods based on semantic segmentation, our method has the advantage of not requiring additional models or other complex operations. In addition, we propose two effective temporal constraint strategies, which utilize the relationship of image sequences to improve the detection performance. Moreover, we propose to use the keypoint matching strategy as the final detector to further refuse false positives. Experiments on four publicly available datasets indicate that our approach can achieve higher accuracy and better robustness than the state-of-the-art methods.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116898427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilling knowledge in causal inference for unbiased visual question answering 为无偏视觉问答提取因果推理中的知识
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446256
Yonghua Pan, Zechao Li, Liyan Zhang, Jinhui Tang
{"title":"Distilling knowledge in causal inference for unbiased visual question answering","authors":"Yonghua Pan, Zechao Li, Liyan Zhang, Jinhui Tang","doi":"10.1145/3444685.3446256","DOIUrl":"https://doi.org/10.1145/3444685.3446256","url":null,"abstract":"Current Visual Question Answering (VQA) models mainly explore the statistical correlations between answers and questions, which fail to capture the relationship between the visual information and answers. The performance dramatically decreases when the distribution of handled data is different from the training data. Towards this end, this paper proposes a novel unbiased VQA model by exploring the Casual Inference with Knowledge Distillation (CIKD) to reduce the influence of bias. Specifically, the causal graph is first constructed to explore the counterfactual causality and infer the casual target based on the causal effect, which well reduces the bias from questions and obtain answers without training. Then knowledge distillation is leveraged to transfer the knowledge of the inferred casual target to the conventional VQA model. It makes the proposed method enable to handle both the biased data and standard data. To address the problem of the bad bias from the knowledge distillation, the ensemble learning is introduced based on the hypothetical bias reason. Experiments are conducted to show the performance of the proposed method. The significant improvements over the state-of-the-art methods on the VQA-CP v2 dataset well validate the contributions of this work.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Global and local feature alignment for video object detection 视频目标检测的全局和局部特征对齐
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446263
Haihui Ye, Qiang Qi, Ying Wang, Yang Lu, Hanzi Wang
{"title":"Global and local feature alignment for video object detection","authors":"Haihui Ye, Qiang Qi, Ying Wang, Yang Lu, Hanzi Wang","doi":"10.1145/3444685.3446263","DOIUrl":"https://doi.org/10.1145/3444685.3446263","url":null,"abstract":"Extending image-based object detectors into video domain suffers from immense inadaptability due to the deteriorated frames caused by motion blur, partial occlusion or strange poses. Therefore, the generated features of deteriorated frames encounter the poor quality of misalignment, which degrades the overall performance of video object detectors. How to capture valuable information locally or globally is of importance to feature alignment but remains quite challenging. In this paper, we propose a Global and Local Feature Alignment (abbreviated as GLFA) module for video object detection, which can distill both global and local information to excavate the deep relationship between features for feature alignment. Specifically, GLFA can model the spatial-temporal dependencies over frames based on propagating global information and capture the interactive correspondences within the same frame based on aggregating valuable local information. Moreover, we further introduce a Self-Adaptive Calibration (SAC) module to strengthen the semantic representation of features and distill valuable local information in a dual local-alignment manner. Experimental results on the ImageNet VID dataset show that the proposed method achieves high performance as well as a good trade-off between real-time speed and competitive accuracy.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131565458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信