ACM Multimedia Asia最新文献

CMRD-Net: An Improved Method for Underwater Image Enhancement CMRD-Net:一种改进的水下图像增强方法

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493590

Fengjie Xu, Chang-Hua Zhang, Zhongshu Chen, Zhekai Du, Lei Han, Lin Zuo

引用次数: 0

Towards Transferable 3D Adversarial Attack 朝向可转移的3D对抗性攻击

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493596

Qiming Lu, Shikui Wei, Haoyu Chu, Yao Zhao

{"title":"Towards Transferable 3D Adversarial Attack","authors":"Qiming Lu, Shikui Wei, Haoyu Chu, Yao Zhao","doi":"10.1145/3469877.3493596","DOIUrl":"https://doi.org/10.1145/3469877.3493596","url":null,"abstract":"Currently, most of the adversarial attacks focused on perturbation adding on 2D images. In this way, however, the adversarial attacks cannot easily be involved in a real-world AI system, since it is impossible for the AI system to open an interface to attackers. Therefore, it is more practical to add perturbation on real-world 3D objects’ surface, i.e., 3D adversarial attacks. The key challenges for 3D adversarial attacks are how to effectively deal with viewpoint changing and keep strong transferability across different state-of-the-art networks. In this paper, we mainly focus on improving the robustness and transferability of 3D adversarial examples generated by perturbing the surface textures of 3D objects. Towards this end, we propose an effective method, named Momentum Gradient-Filter Sign Method (M-GFSM), to generate 3D adversarial examples. Specially, the momentum is introduced into the procedure of 3D adversarial examples generation, which results in multiview robustness of 3D adversarial examples and high efficiency of attacking by updating the perturbation and stabilizing the update directions. In addition, filter operation is involved to improve the transferability of 3D adversarial examples by filtering gradient images selectively and completing the gradients of neglected pixels caused by downsampling in the rendering stage. Experimental results show the effectiveness and good transferability of the proposed method. Besides, we show that the 3D adversarial examples generated by our method still be robust under different illuminations.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117037993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Hybrid Improvements in Multimodal Analysis for Deep Video Understanding 用于深度视频理解的多模态分析的混合改进

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493599

Beibei Zhang, Fan Yu, Yaqun Fang, Tongwei Ren, Gangshan Wu

引用次数: 3

Score Transformer: Generating Musical Score from Note-level Representation 分数转换器:从音符级表示生成乐谱

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490612

Masahiro Suzuki

{"title":"Score Transformer: Generating Musical Score from Note-level Representation","authors":"Masahiro Suzuki","doi":"10.1145/3469877.3490612","DOIUrl":"https://doi.org/10.1145/3469877.3490612","url":null,"abstract":"In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Discovering Social Connections using Event Images 使用事件图像发现社会联系

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493699

Ming Cheung, Weiwei Sun, Jiantao Zhou

引用次数: 1

Blindly Predict Image and Video Quality in the Wild 在野外盲目预测图像和视频质量

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490588

Jiapeng Tang, Yi Fang, Yu Dong, Rong Xie, Xiao Gu, Guangtao Zhai, Li Song

{"title":"Blindly Predict Image and Video Quality in the Wild","authors":"Jiapeng Tang, Yi Fang, Yu Dong, Rong Xie, Xiao Gu, Guangtao Zhai, Li Song","doi":"10.1145/3469877.3490588","DOIUrl":"https://doi.org/10.1145/3469877.3490588","url":null,"abstract":"Emerging interests have been brought to blind quality assessment for images/videos captured in the wild, known as in-the-wild I/VQA. Prior deep learning based approaches have achieved considerable progress in I/VQA, but are intrinsically troubled with two issues. Firstly, most existing methods fine-tune the image-classification-oriented pre-trained models for the absence of large-scale I/VQA datasets. However, the task misalignment between I/VQA and image classification leads to degraded generalization performance. Secondly, existing VQA methods directly conduct temporal pooling on the predicted frame-wise scores, resulting in ambiguous inter-frame relation modeling. In this work, we propose a two-stage architecture to separately predict image and video quality in the wild. In the first stage, we resort to supervised contrastive learning to derive quality-aware representations that facilitate the prediction of image quality. Specifically, we propose a novel quality-aware contrastive loss to pull together samples of similar quality and push away quality-different ones in embedding space. In the second stage, we develop a Relation-Guided Temporal Attention (RTA) module for video quality prediction, which captures global inter-frame dependencies in embedding space to learn frame-wise attention weights for frame quality aggregation. Extensive experiments demonstrate that our approach performs favorably against state-of-the-art methods on both authentically distorted image benchmarks and video benchmarks.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133850026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification 车辆再识别的局部增强多分辨率表示学习

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497690

Jun Zhang, X. Zhong, Jingling Yuan, Shilei Zhao, Rongbo Zhang, Duxiu Feng, Luo Zhong

{"title":"Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification","authors":"Jun Zhang, X. Zhong, Jingling Yuan, Shilei Zhao, Rongbo Zhang, Duxiu Feng, Luo Zhong","doi":"10.1145/3469877.3497690","DOIUrl":"https://doi.org/10.1145/3469877.3497690","url":null,"abstract":"In real traffic scenarios, the changes of vehicle resolution that the camera captures tend to be relatively obvious considering the distances to the vehicle, different directions, and height of the camera. When the resolution difference exists between the probe and the gallery vehicle, the resolution mismatch will occur, which will seriously influence the performance of the vehicle re-identification (Re-ID). This problem is also known as multi-resolution vehicle Re-ID. An effective strategy is equivalent to utilize image super-resolution to handle the resolution gap. However, existing methods conduct super-resolution on global images instead of local representation of each image, leading to much more noisy information generated from the background and illumination variations. In our work, a local-enhanced multi-resolution representation learning (LMRL) is therefore proposed to address these problems by combining the training of local-enhanced super-resolution (LSR) module and local-guided contrastive learning (LCL) module. Specifically, we use a parsing network to parse a vehicle into four different parts to extract local-enhanced vehicle representation. And then, the LSR module, which consists of two auto-encoders that share parameters, transforms low-resolution images into high-resolution in both global and local branches. LCL module can learn discriminative vehicle representation by contrasting local representation between the high-resolution reconstructed image and the ground truth. We evaluate our approach on two public datasets that contain vehicle images at a wide range of resolutions, in which our approach shows significant superiority to the existing solution.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115995625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Model-Guided Unfolding Network for Single Image Reflection Removal 单幅图像反射去除的模型引导展开网络

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490607

Dongliang Shao, Yunhui Shi, Jin Wang, N. Ling, Baocai Yin

引用次数: 0

Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features 基于对比学习和基于clip特征的服装条件图像检索

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493593

Alberto Baldrati, M. Bertini, Tiberio Uricchio, A. del Bimbo

引用次数: 13

Deep Adaptive Attention Triple Hashing 深度自适应注意力三重哈希

ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495646

Yang Shi, Xiushan Nie, Quan Zhou, Li Zou, Yilong Yin

引用次数: 1