Proceedings of the ACM Multimedia Asia最新文献_第3页

Multi-Scale Invertible Network for Image Super-Resolution 图像超分辨率多尺度可逆网络

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366576

Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue

{"title":"Multi-Scale Invertible Network for Image Super-Resolution","authors":"Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue","doi":"10.1145/3338533.3366576","DOIUrl":"https://doi.org/10.1145/3338533.3366576","url":null,"abstract":"Deep convolutional neural networks (CNNs) based image super-resolution approaches have reached significant success in recent years. However, due to the information-discarded nature of CNN, they inevitably suffer from information loss during the feature embedding process, in which extracted intermediate features cannot effectively represent or reconstruct the input. As a result, the super-resolved image will have large deviations in image structure with its low-resolution version, leading to inaccurate representations in some local details. In this study, we address this problem by designing an end-to-end invertible architecture that can reversely represent low-resolution images in any feature embedding level. Specifically, we propose a novel image super-resolution method, named multi-scale invertible network (MSIN) to keep information lossless and introduce multi-scale learning in a unified framework. In MSIN, a novel multi-scale invertible stack is proposed, which adopts four parallel branches to respectively capture features with different scales and keeps balanced information-interaction by branch shifting. In addition, we employee global and hierarchical feature fusion to learn elaborate and comprehensive feature representations, in order to further benefit the quality of final image reconstruction. We show the reversibility of the proposed MSIN, and extensive experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Efficient Parameter Optimization Algorithm and Its Application to Image De-noising 一种有效的参数优化算法及其在图像去噪中的应用

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366573

Yinhao Liu, Xiaofeng Huang, Mengting Fan, Haibing Yin

引用次数: 0

Adaptive Bilinear Pooling for Fine-grained Representation Learning 细粒度表示学习的自适应双线性池

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366567

Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang

{"title":"Adaptive Bilinear Pooling for Fine-grained Representation Learning","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":"https://doi.org/10.1145/3338533.3366567","url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network 停止隐藏在挡风玻璃后面:基于双向生成对抗网络的挡风玻璃图像增强器

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366559

Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng

{"title":"Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network","authors":"Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366559","DOIUrl":"https://doi.org/10.1145/3338533.3366559","url":null,"abstract":"Windshield images captured by surveillance cameras are usually difficult to be seen through due to severe image degradation such as reflection, motion blur, low light, haze, and noise. Such image degradation hinders the capability of identifying and tracking people. In this paper, we aim to address this challenging windshield images enhancement task by presenting a novel deep learning model based on a two-way generative adversarial network, called Two-way Individual Normalization Perceptual Adversarial Network, TWIN-PAN. TWIN-PAN is an unpaired learning network which does not require pairs of degraded and corresponding ground truth images for training. Also, unlike existing image restoration algorithms which only address one specific type of degradation at once, TWIN-PAN can restore the image from various types of degradation. To restore the content inside the extremely degraded windshield and ensure the semantic consistency of the image, we introduce cyclic perceptual loss to the network and combine it with cycle-consistency loss. Moreover, to generate better restoration images, we introduce individual instance normalization layers for the generators, which can help our generators better adapt to their own input distributions. Furthermore, we collect a large high-quality windshield image dataset (WIE-Dataset) to train our network and to validate the robustness of our method in restoring degraded windshield images. Experimental results on human detection, vehicle ReID and user study manifest that the proposed method is effective for windshield image restoration.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130696345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Domain Specific and Idiom Adaptive Video Summarization 特定领域和习语自适应视频摘要

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366603

Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang

{"title":"Domain Specific and Idiom Adaptive Video Summarization","authors":"Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang","doi":"10.1145/3338533.3366603","DOIUrl":"https://doi.org/10.1145/3338533.3366603","url":null,"abstract":"As short videos become an increasingly popular form of storytelling, there is a growing demand for video summarization to convey information concisely with a subset of video frames. Some criteria such as interestingness and diversity are used by existing efforts to pick appropriate segments of content. However, there lacks a mechanism to infuse insights from cinematography and persuasion into this process. As a result, the results of the video summarization sometimes deviate from the original. In addition, the exploration of the vast design space to create customized video summaries is costly for video producer. To address these challenges, we propose a domain specific and idiom adaptive video summarization approach. Specifically, our approach first segments the input video and extracts high-level information from each segment. Such labels are used to represent a collection of idioms and summarization metrics as submodular components which users can combine to create personalized summary styles in a variety of ways. In order to identify the importance of the idioms and metrics in different domains, we leverage max margin learning. Experimental results have validated the effectiveness of our approach. We also plan to release a dataset containing over 600 videos with expert annotations which can benefit further research in this area.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"27 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113962432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Manifold Alignment with Multi-graph Embedding 多图嵌入的流形对齐

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366588

Chang-Bin Huang, Timothy Apasiba Abeo, Xiang-jun Shen

引用次数: 0

RSC-DGS: Fusion of RGB and NIR Images Using Robust Spectral Consistency and Dynamic Gradient Sparsity RSC-DGS:基于鲁棒光谱一致性和动态梯度稀疏性的RGB和NIR图像融合

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3368261

Shengtao Yu, Cheolkon Jung, Kailong Zhou, Chen Su

引用次数: 1

Dense Attention Network for Facial Expression Recognition in the Wild 面向野外面部表情识别的密集注意网络

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366568

Cong Wang, K. Lu, Jian Xue, Yanfu Yan

引用次数: 3

Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval 用于图像检索的注意力感知特征金字塔序数哈希

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366598

Xie Sun, Lu Jin, Zechao Li

引用次数: 4

NRQQA NRQQA

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366563

Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang

引用次数: 2