ACM Multimedia Asia最新文献

筛选
英文 中文
CMRD-Net: An Improved Method for Underwater Image Enhancement CMRD-Net:一种改进的水下图像增强方法
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493590
Fengjie Xu, Chang-Hua Zhang, Zhongshu Chen, Zhekai Du, Lei Han, Lin Zuo
{"title":"CMRD-Net: An Improved Method for Underwater Image Enhancement","authors":"Fengjie Xu, Chang-Hua Zhang, Zhongshu Chen, Zhekai Du, Lei Han, Lin Zuo","doi":"10.1145/3469877.3493590","DOIUrl":"https://doi.org/10.1145/3469877.3493590","url":null,"abstract":"Underwater image enhancement is a challenging task due to the degradation of image quality in underwater complicated lighting conditions and scenes. In recent years, most methods improve the visual quality of underwater images by using deep Convolutional Neural Networks and Generative Adversarial Networks. However, the majority of existing methods do not consider that the attenuation degrees of R, G, B channels of the underwater image are different, leading to a sub-optimal performance. Based on this observation, we propose a Channel-wise Multi-scale Residual Dense Network called CMRD-Net, which learns the weights of different color channels instead of treating all the channels equally. More specifically, the Channel-wise Multi-scale Fusion Residual Attention Block (CMFRAB) is involved in the CMRD-Net to obtain a better ability of feature extraction and representation. Notably, we evaluate the effectiveness of our model by comparing it with recent state-of-the-art methods. Extensive experimental results show that our method can achieve a satisfactory performance on a popular public dataset.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114583006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Transferable 3D Adversarial Attack 朝向可转移的3D对抗性攻击
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493596
Qiming Lu, Shikui Wei, Haoyu Chu, Yao Zhao
{"title":"Towards Transferable 3D Adversarial Attack","authors":"Qiming Lu, Shikui Wei, Haoyu Chu, Yao Zhao","doi":"10.1145/3469877.3493596","DOIUrl":"https://doi.org/10.1145/3469877.3493596","url":null,"abstract":"Currently, most of the adversarial attacks focused on perturbation adding on 2D images. In this way, however, the adversarial attacks cannot easily be involved in a real-world AI system, since it is impossible for the AI system to open an interface to attackers. Therefore, it is more practical to add perturbation on real-world 3D objects’ surface, i.e., 3D adversarial attacks. The key challenges for 3D adversarial attacks are how to effectively deal with viewpoint changing and keep strong transferability across different state-of-the-art networks. In this paper, we mainly focus on improving the robustness and transferability of 3D adversarial examples generated by perturbing the surface textures of 3D objects. Towards this end, we propose an effective method, named Momentum Gradient-Filter Sign Method (M-GFSM), to generate 3D adversarial examples. Specially, the momentum is introduced into the procedure of 3D adversarial examples generation, which results in multiview robustness of 3D adversarial examples and high efficiency of attacking by updating the perturbation and stabilizing the update directions. In addition, filter operation is involved to improve the transferability of 3D adversarial examples by filtering gradient images selectively and completing the gradients of neglected pixels caused by downsampling in the rendering stage. Experimental results show the effectiveness and good transferability of the proposed method. Besides, we show that the 3D adversarial examples generated by our method still be robust under different illuminations.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117037993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid Improvements in Multimodal Analysis for Deep Video Understanding 用于深度视频理解的多模态分析的混合改进
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493599
Beibei Zhang, Fan Yu, Yaqun Fang, Tongwei Ren, Gangshan Wu
{"title":"Hybrid Improvements in Multimodal Analysis for Deep Video Understanding","authors":"Beibei Zhang, Fan Yu, Yaqun Fang, Tongwei Ren, Gangshan Wu","doi":"10.1145/3469877.3493599","DOIUrl":"https://doi.org/10.1145/3469877.3493599","url":null,"abstract":"The Deep Video Understanding Challenge (DVU) is a task that focuses on comprehending long duration videos which involve many entities. Its main goal is to build relationship and interaction knowledge graph between entities to answer relevant questions. In this paper, we improved the joint learning method which we previously proposed in many aspects, including few shot learning, optical flow feature, entity recognition, and video description matching. We verified the effectiveness of these measures through experiments.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124966874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Score Transformer: Generating Musical Score from Note-level Representation 分数转换器:从音符级表示生成乐谱
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490612
Masahiro Suzuki
{"title":"Score Transformer: Generating Musical Score from Note-level Representation","authors":"Masahiro Suzuki","doi":"10.1145/3469877.3490612","DOIUrl":"https://doi.org/10.1145/3469877.3490612","url":null,"abstract":"In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Discovering Social Connections using Event Images 使用事件图像发现社会联系
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493699
Ming Cheung, Weiwei Sun, Jiantao Zhou
{"title":"Discovering Social Connections using Event Images","authors":"Ming Cheung, Weiwei Sun, Jiantao Zhou","doi":"10.1145/3469877.3493699","DOIUrl":"https://doi.org/10.1145/3469877.3493699","url":null,"abstract":"Social events are very common activities, where people can interact with each other. During an event, the organizer often hires photographers to take images, which provide rich information about the participants’ behaviour. In this work, we propose a method to discover the social graphs among event participants from the event images for social network analytics. By studying over 94 events with 32,330 event images, it is proven that the social graphs can be effectively extracted solely from event images. It is found that the discovered social graphs follow similar properties of online social graphs; for instance, the degree distribution obeys power law distribution. The usefulness of the proposed method for social graph discovery from event images is demonstrated through two applications: important participants detection and community detection. To the best of our knowledge, it is the first work to show the feasibility of discovering social graphs by utilizing event images only. As a result, social network analytics such as recommendations become possible, even without access to the online social graph.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blindly Predict Image and Video Quality in the Wild 在野外盲目预测图像和视频质量
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490588
Jiapeng Tang, Yi Fang, Yu Dong, Rong Xie, Xiao Gu, Guangtao Zhai, Li Song
{"title":"Blindly Predict Image and Video Quality in the Wild","authors":"Jiapeng Tang, Yi Fang, Yu Dong, Rong Xie, Xiao Gu, Guangtao Zhai, Li Song","doi":"10.1145/3469877.3490588","DOIUrl":"https://doi.org/10.1145/3469877.3490588","url":null,"abstract":"Emerging interests have been brought to blind quality assessment for images/videos captured in the wild, known as in-the-wild I/VQA. Prior deep learning based approaches have achieved considerable progress in I/VQA, but are intrinsically troubled with two issues. Firstly, most existing methods fine-tune the image-classification-oriented pre-trained models for the absence of large-scale I/VQA datasets. However, the task misalignment between I/VQA and image classification leads to degraded generalization performance. Secondly, existing VQA methods directly conduct temporal pooling on the predicted frame-wise scores, resulting in ambiguous inter-frame relation modeling. In this work, we propose a two-stage architecture to separately predict image and video quality in the wild. In the first stage, we resort to supervised contrastive learning to derive quality-aware representations that facilitate the prediction of image quality. Specifically, we propose a novel quality-aware contrastive loss to pull together samples of similar quality and push away quality-different ones in embedding space. In the second stage, we develop a Relation-Guided Temporal Attention (RTA) module for video quality prediction, which captures global inter-frame dependencies in embedding space to learn frame-wise attention weights for frame quality aggregation. Extensive experiments demonstrate that our approach performs favorably against state-of-the-art methods on both authentically distorted image benchmarks and video benchmarks.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133850026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification 车辆再识别的局部增强多分辨率表示学习
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497690
Jun Zhang, X. Zhong, Jingling Yuan, Shilei Zhao, Rongbo Zhang, Duxiu Feng, Luo Zhong
{"title":"Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification","authors":"Jun Zhang, X. Zhong, Jingling Yuan, Shilei Zhao, Rongbo Zhang, Duxiu Feng, Luo Zhong","doi":"10.1145/3469877.3497690","DOIUrl":"https://doi.org/10.1145/3469877.3497690","url":null,"abstract":"In real traffic scenarios, the changes of vehicle resolution that the camera captures tend to be relatively obvious considering the distances to the vehicle, different directions, and height of the camera. When the resolution difference exists between the probe and the gallery vehicle, the resolution mismatch will occur, which will seriously influence the performance of the vehicle re-identification (Re-ID). This problem is also known as multi-resolution vehicle Re-ID. An effective strategy is equivalent to utilize image super-resolution to handle the resolution gap. However, existing methods conduct super-resolution on global images instead of local representation of each image, leading to much more noisy information generated from the background and illumination variations. In our work, a local-enhanced multi-resolution representation learning (LMRL) is therefore proposed to address these problems by combining the training of local-enhanced super-resolution (LSR) module and local-guided contrastive learning (LCL) module. Specifically, we use a parsing network to parse a vehicle into four different parts to extract local-enhanced vehicle representation. And then, the LSR module, which consists of two auto-encoders that share parameters, transforms low-resolution images into high-resolution in both global and local branches. LCL module can learn discriminative vehicle representation by contrasting local representation between the high-resolution reconstructed image and the ground truth. We evaluate our approach on two public datasets that contain vehicle images at a wide range of resolutions, in which our approach shows significant superiority to the existing solution.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115995625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Model-Guided Unfolding Network for Single Image Reflection Removal 单幅图像反射去除的模型引导展开网络
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490607
Dongliang Shao, Yunhui Shi, Jin Wang, N. Ling, Baocai Yin
{"title":"A Model-Guided Unfolding Network for Single Image Reflection Removal","authors":"Dongliang Shao, Yunhui Shi, Jin Wang, N. Ling, Baocai Yin","doi":"10.1145/3469877.3490607","DOIUrl":"https://doi.org/10.1145/3469877.3490607","url":null,"abstract":"Removing undesirable reflections from a single image captured through a glass surface is of broad application to various image processing and computer vision tasks, but it is an ill-posed and challenging problem. Existing traditional single image reflection removal(SIRR) methods are often less efficient to remove reflection due to the limited description ability of handcrafted priors. State-of-the-art learning based methods often cause instability problems because they are designed as unexplainable black boxes. In this paper, we present an explainable approach for SIRR named model-guided unfolding network(MoG-SIRR), which is unfolded from our proposed reflection removal model with non-local autoregressive prior and dereflection prior. In order to complement the transmission layer and the reflection layer in a single image, we construct a deep learning framework with two streams by integrating reflection removal and non-local regularization into trainable modules. Extensive experiments on public benchmark datasets demonstrate that our method achieves superior performance for single image reflection removal.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129369162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features 基于对比学习和基于clip特征的服装条件图像检索
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493593
Alberto Baldrati, M. Bertini, Tiberio Uricchio, A. del Bimbo
{"title":"Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features","authors":"Alberto Baldrati, M. Bertini, Tiberio Uricchio, A. del Bimbo","doi":"10.1145/3469877.3493593","DOIUrl":"https://doi.org/10.1145/3469877.3493593","url":null,"abstract":"Building on the recent advances in multimodal zero-shot representation learning, in this paper we explore the use of features obtained from the recent CLIP model to perform conditioned image retrieval. Starting from a reference image and an additive textual description of what the user wants with respect to the reference image, we learn a Combiner network that is able to understand the image content, integrate the textual description and provide combined feature used to perform the conditioned image retrieval. Starting from the bare CLIP features and a simple baseline, we show that a carefully crafted Combiner network, based on such multimodal features, is extremely effective and outperforms more complex state of the art approaches on the popular FashionIQ dataset.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130982013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Deep Adaptive Attention Triple Hashing 深度自适应注意力三重哈希
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495646
Yang Shi, Xiushan Nie, Quan Zhou, Li Zou, Yilong Yin
{"title":"Deep Adaptive Attention Triple Hashing","authors":"Yang Shi, Xiushan Nie, Quan Zhou, Li Zou, Yilong Yin","doi":"10.1145/3469877.3495646","DOIUrl":"https://doi.org/10.1145/3469877.3495646","url":null,"abstract":"Recent studies have verified that learning compact hash codes can facilitate big data retrieval processing. In particular, learning the deep hash function can greatly improve the retrieval performance. However, the existing deep supervised hashing algorithm treats all the samples in the same way, which leads to insufficient learning of difficult samples. Therefore, we cannot obtain the accurate learning of the similarity relation, making it difficult to achieve satisfactory performance. In light of this, this work proposes a deep supervised hashing model, called deep adaptive attention triple hashing (DAATH), which weights the similarity prediction scores of positive and negative samples in the form of triples, thus giving different degrees of attention to different samples. Compared with the traditional triple loss, it places a greater emphasis on the difficult triple, dramatically reducing the redundant calculation. Extensive experiments have been conducted to show that DAAH consistently outperforms the state-of-the-arts, confirmed its the effectiveness.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130413777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信