Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue
{"title":"Multi-Scale Invertible Network for Image Super-Resolution","authors":"Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue","doi":"10.1145/3338533.3366576","DOIUrl":"https://doi.org/10.1145/3338533.3366576","url":null,"abstract":"Deep convolutional neural networks (CNNs) based image super-resolution approaches have reached significant success in recent years. However, due to the information-discarded nature of CNN, they inevitably suffer from information loss during the feature embedding process, in which extracted intermediate features cannot effectively represent or reconstruct the input. As a result, the super-resolved image will have large deviations in image structure with its low-resolution version, leading to inaccurate representations in some local details. In this study, we address this problem by designing an end-to-end invertible architecture that can reversely represent low-resolution images in any feature embedding level. Specifically, we propose a novel image super-resolution method, named multi-scale invertible network (MSIN) to keep information lossless and introduce multi-scale learning in a unified framework. In MSIN, a novel multi-scale invertible stack is proposed, which adopts four parallel branches to respectively capture features with different scales and keeps balanced information-interaction by branch shifting. In addition, we employee global and hierarchical feature fusion to learn elaborate and comprehensive feature representations, in order to further benefit the quality of final image reconstruction. We show the reversibility of the proposed MSIN, and extensive experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Parameter Optimization Algorithm and Its Application to Image De-noising","authors":"Yinhao Liu, Xiaofeng Huang, Mengting Fan, Haibing Yin","doi":"10.1145/3338533.3366573","DOIUrl":"https://doi.org/10.1145/3338533.3366573","url":null,"abstract":"Prevailing image enhancement algorithms deliver flexible tradeoff at different level between image quality and implementation complexity, which is usually achieved via adjusting multiple algorithm parameters, i.e. multiple parameter optimization. Traditional exhaustive search over the whole solution space can resolve this optimization problem, however suffering from high search complexity caused by huge amount of multi-parameter combinations. To resolve this problem, an Energy Efficiency Ratio Model (EERM) based algorithm is proposed which is inspired from gradient decent in deep learning. To verify the effectiveness of the proposed algorithm, it is then applied to image de-noising algorithm framework based on non-local means (NLM) plus iteration. The experiment result shows that the optimal parameter combination decided by our proposed algorithm can achieve the comparable quality to that of the exhaustive search based method. Specifically, 86.7% complexity reduction can be achieved with only 0.05dB quality degradation with proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Bilinear Pooling for Fine-grained Representation Learning","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":"https://doi.org/10.1145/3338533.3366567","url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network","authors":"Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366559","DOIUrl":"https://doi.org/10.1145/3338533.3366559","url":null,"abstract":"Windshield images captured by surveillance cameras are usually difficult to be seen through due to severe image degradation such as reflection, motion blur, low light, haze, and noise. Such image degradation hinders the capability of identifying and tracking people. In this paper, we aim to address this challenging windshield images enhancement task by presenting a novel deep learning model based on a two-way generative adversarial network, called Two-way Individual Normalization Perceptual Adversarial Network, TWIN-PAN. TWIN-PAN is an unpaired learning network which does not require pairs of degraded and corresponding ground truth images for training. Also, unlike existing image restoration algorithms which only address one specific type of degradation at once, TWIN-PAN can restore the image from various types of degradation. To restore the content inside the extremely degraded windshield and ensure the semantic consistency of the image, we introduce cyclic perceptual loss to the network and combine it with cycle-consistency loss. Moreover, to generate better restoration images, we introduce individual instance normalization layers for the generators, which can help our generators better adapt to their own input distributions. Furthermore, we collect a large high-quality windshield image dataset (WIE-Dataset) to train our network and to validate the robustness of our method in restoring degraded windshield images. Experimental results on human detection, vehicle ReID and user study manifest that the proposed method is effective for windshield image restoration.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130696345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang
{"title":"Domain Specific and Idiom Adaptive Video Summarization","authors":"Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang","doi":"10.1145/3338533.3366603","DOIUrl":"https://doi.org/10.1145/3338533.3366603","url":null,"abstract":"As short videos become an increasingly popular form of storytelling, there is a growing demand for video summarization to convey information concisely with a subset of video frames. Some criteria such as interestingness and diversity are used by existing efforts to pick appropriate segments of content. However, there lacks a mechanism to infuse insights from cinematography and persuasion into this process. As a result, the results of the video summarization sometimes deviate from the original. In addition, the exploration of the vast design space to create customized video summaries is costly for video producer. To address these challenges, we propose a domain specific and idiom adaptive video summarization approach. Specifically, our approach first segments the input video and extracts high-level information from each segment. Such labels are used to represent a collection of idioms and summarization metrics as submodular components which users can combine to create personalized summary styles in a variety of ways. In order to identify the importance of the idioms and metrics in different domains, we leverage max margin learning. Experimental results have validated the effectiveness of our approach. We also plan to release a dataset containing over 600 videos with expert annotations which can benefit further research in this area.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"27 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113962432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Manifold Alignment with Multi-graph Embedding","authors":"Chang-Bin Huang, Timothy Apasiba Abeo, Xiang-jun Shen","doi":"10.1145/3338533.3366588","DOIUrl":"https://doi.org/10.1145/3338533.3366588","url":null,"abstract":"In this paper, a novel manifold alignment approach via multi-graph embedding (MA-MGE) is proposed. Different from the traditional manifold alignment algorithms that use a single graph to describe the latent manifold structure of each dataset, our approach utilizes multiple graphs for modeling multiple local manifolds in multi-view data alignment. Therefore a composite manifold representation with complete and more useful information is obtained from each dataset through a dynamic reconstruction of multiple graphs. Experimental results on Protein and Face-10 datasets demonstrate that the mapping coordinates of the proposed method provide better alignment performance compared to the state-of-the-art methods, such as semi-supervised manifold alignment (SS-MA), manifold alignment using Procrustes analysis (PAMA) and manifold alignment without correspondence (UNMA).","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RSC-DGS: Fusion of RGB and NIR Images Using Robust Spectral Consistency and Dynamic Gradient Sparsity","authors":"Shengtao Yu, Cheolkon Jung, Kailong Zhou, Chen Su","doi":"10.1145/3338533.3368261","DOIUrl":"https://doi.org/10.1145/3338533.3368261","url":null,"abstract":"Color (RGB) images captured under low light condition contain much noise with loss of textures. Since near-infrared (NIR) images are robust to noise with clear textures even in low light condition, they can be used to enhance low light RGB images by image fusion. In this paper, we propose fusion of RGB and NIR images using robust spectral consistency (RSC) and dynamic gradient sparsity (DGS), called RSC-DGS. We build the RSC model based on a robust error function to remove noise and preserve color/spectral consistency. We construct the DGS model based on vectorial total variation minimization that uses the NIR image as the reference image. The DGS model transfers clear textures of the NIR image to the fusion result and successfully preserves cross-channel interdependency of the RGB image. We use alternating direction method of multipliers (ADMM) for efficiency to solve the proposed RSC-DGS fusion. Experimental results confirm that the proposed method effectively preserves color/spectral consistency and textures in fusion results while successfully removing noise with high computational efficiency.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dense Attention Network for Facial Expression Recognition in the Wild","authors":"Cong Wang, K. Lu, Jian Xue, Yanfu Yan","doi":"10.1145/3338533.3366568","DOIUrl":"https://doi.org/10.1145/3338533.3366568","url":null,"abstract":"Recognizing facial expression is significant for human-computer interaction system and other applications. A certain number of facial expression datasets have been published in recent decades and helped with the improvements for emotion classification algorithms. However, recognition of the realistic expressions in the wild is still challenging because of uncontrolled lighting, brightness, pose, occlusion, etc. In this paper, we propose an attention mechanism based module which can help the network focus on the emotion-related locations. Furthermore, we produce two network structures named DenseCANet and DenseSANet by using the attention modules based on the backbone of DenseNet. Then these two networks and original DenseNet are trained on wild dataset AffectNet and lab-controlled dataset CK+. Experimental results show that the DenseSANet has improved the performance on both datasets comparing with the state-of-the-art methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval","authors":"Xie Sun, Lu Jin, Zechao Li","doi":"10.1145/3338533.3366598","DOIUrl":"https://doi.org/10.1145/3338533.3366598","url":null,"abstract":"Due to the effectiveness of representation learning, deep hashing methods have attracted increasing attention in image retrieval. However, most existing deep hashing methods merely encode the raw information of the last layer for hash learning, which result in the following deficiencies: (1) the useful information from the preceding-layer is not fully exploited; (2) the local salient information of the image is neglected. To this end, we propose a novel deep hashing method, called Attention-Aware Feature Pyramid Ordinal Hashing (AFPH), which explores both the visual structure information and semantic information from different convolutional layers. Specifically, two feature pyramids based on spatial and channel attention are well constructed to capture the local salient structure from multiple scales. Moreover, a multi-scale feature fusion strategy is proposed to aggregate the feature maps from multi-level pyramidal layers to generate the discriminative feature for ranking-based hashing. The experimental results conducted on two widely-used image retrieval datasets demonstrate the superiority of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126535681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang
{"title":"NRQQA","authors":"Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang","doi":"10.1145/3338533.3366563","DOIUrl":"https://doi.org/10.1145/3338533.3366563","url":null,"abstract":"Image stitching technology has been widely used in immersive applications, such as 3D modeling, VR and AR. The quality of stitching results is crucial. At present, the objective quality assessment methods of stitched images are mainly based on the availability of ground truth (i.e., Full-Reference). However, in most cases, ground truth is unavailable. In this paper, a no-reference quality assessment metric specifically designed for stitched images is proposed. We first find out the corresponding parts of source images in the stitched image. Then, the isolated points and the outer points generated by spherical projection are eliminated. After that, we take advantage of the bounding rectangle of stitching seams to locate the position of overlapping regions in the stitched image. Finally, the assessment of overlapping regions is taken as the final scoring result. Extensive experiments have shown that our scores are consistent with human vision. Even for the nuances that cannot be distinguished by human eyes, our proposed metric is also effective.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126192781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}