Proceedings of the ACM Multimedia Asia最新文献

筛选
英文 中文
Multi-Scale Invertible Network for Image Super-Resolution 图像超分辨率多尺度可逆网络
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366576
Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue
{"title":"Multi-Scale Invertible Network for Image Super-Resolution","authors":"Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue","doi":"10.1145/3338533.3366576","DOIUrl":"https://doi.org/10.1145/3338533.3366576","url":null,"abstract":"Deep convolutional neural networks (CNNs) based image super-resolution approaches have reached significant success in recent years. However, due to the information-discarded nature of CNN, they inevitably suffer from information loss during the feature embedding process, in which extracted intermediate features cannot effectively represent or reconstruct the input. As a result, the super-resolved image will have large deviations in image structure with its low-resolution version, leading to inaccurate representations in some local details. In this study, we address this problem by designing an end-to-end invertible architecture that can reversely represent low-resolution images in any feature embedding level. Specifically, we propose a novel image super-resolution method, named multi-scale invertible network (MSIN) to keep information lossless and introduce multi-scale learning in a unified framework. In MSIN, a novel multi-scale invertible stack is proposed, which adopts four parallel branches to respectively capture features with different scales and keeps balanced information-interaction by branch shifting. In addition, we employee global and hierarchical feature fusion to learn elaborate and comprehensive feature representations, in order to further benefit the quality of final image reconstruction. We show the reversibility of the proposed MSIN, and extensive experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Efficient Parameter Optimization Algorithm and Its Application to Image De-noising 一种有效的参数优化算法及其在图像去噪中的应用
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366573
Yinhao Liu, Xiaofeng Huang, Mengting Fan, Haibing Yin
{"title":"An Efficient Parameter Optimization Algorithm and Its Application to Image De-noising","authors":"Yinhao Liu, Xiaofeng Huang, Mengting Fan, Haibing Yin","doi":"10.1145/3338533.3366573","DOIUrl":"https://doi.org/10.1145/3338533.3366573","url":null,"abstract":"Prevailing image enhancement algorithms deliver flexible tradeoff at different level between image quality and implementation complexity, which is usually achieved via adjusting multiple algorithm parameters, i.e. multiple parameter optimization. Traditional exhaustive search over the whole solution space can resolve this optimization problem, however suffering from high search complexity caused by huge amount of multi-parameter combinations. To resolve this problem, an Energy Efficiency Ratio Model (EERM) based algorithm is proposed which is inspired from gradient decent in deep learning. To verify the effectiveness of the proposed algorithm, it is then applied to image de-noising algorithm framework based on non-local means (NLM) plus iteration. The experiment result shows that the optimal parameter combination decided by our proposed algorithm can achieve the comparable quality to that of the exhaustive search based method. Specifically, 86.7% complexity reduction can be achieved with only 0.05dB quality degradation with proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Bilinear Pooling for Fine-grained Representation Learning 细粒度表示学习的自适应双线性池
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366567
Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang
{"title":"Adaptive Bilinear Pooling for Fine-grained Representation Learning","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":"https://doi.org/10.1145/3338533.3366567","url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network 停止隐藏在挡风玻璃后面:基于双向生成对抗网络的挡风玻璃图像增强器
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366559
Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng
{"title":"Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network","authors":"Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366559","DOIUrl":"https://doi.org/10.1145/3338533.3366559","url":null,"abstract":"Windshield images captured by surveillance cameras are usually difficult to be seen through due to severe image degradation such as reflection, motion blur, low light, haze, and noise. Such image degradation hinders the capability of identifying and tracking people. In this paper, we aim to address this challenging windshield images enhancement task by presenting a novel deep learning model based on a two-way generative adversarial network, called Two-way Individual Normalization Perceptual Adversarial Network, TWIN-PAN. TWIN-PAN is an unpaired learning network which does not require pairs of degraded and corresponding ground truth images for training. Also, unlike existing image restoration algorithms which only address one specific type of degradation at once, TWIN-PAN can restore the image from various types of degradation. To restore the content inside the extremely degraded windshield and ensure the semantic consistency of the image, we introduce cyclic perceptual loss to the network and combine it with cycle-consistency loss. Moreover, to generate better restoration images, we introduce individual instance normalization layers for the generators, which can help our generators better adapt to their own input distributions. Furthermore, we collect a large high-quality windshield image dataset (WIE-Dataset) to train our network and to validate the robustness of our method in restoring degraded windshield images. Experimental results on human detection, vehicle ReID and user study manifest that the proposed method is effective for windshield image restoration.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130696345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain Specific and Idiom Adaptive Video Summarization 特定领域和习语自适应视频摘要
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366603
Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang
{"title":"Domain Specific and Idiom Adaptive Video Summarization","authors":"Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang","doi":"10.1145/3338533.3366603","DOIUrl":"https://doi.org/10.1145/3338533.3366603","url":null,"abstract":"As short videos become an increasingly popular form of storytelling, there is a growing demand for video summarization to convey information concisely with a subset of video frames. Some criteria such as interestingness and diversity are used by existing efforts to pick appropriate segments of content. However, there lacks a mechanism to infuse insights from cinematography and persuasion into this process. As a result, the results of the video summarization sometimes deviate from the original. In addition, the exploration of the vast design space to create customized video summaries is costly for video producer. To address these challenges, we propose a domain specific and idiom adaptive video summarization approach. Specifically, our approach first segments the input video and extracts high-level information from each segment. Such labels are used to represent a collection of idioms and summarization metrics as submodular components which users can combine to create personalized summary styles in a variety of ways. In order to identify the importance of the idioms and metrics in different domains, we leverage max margin learning. Experimental results have validated the effectiveness of our approach. We also plan to release a dataset containing over 600 videos with expert annotations which can benefit further research in this area.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"27 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113962432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Manifold Alignment with Multi-graph Embedding 多图嵌入的流形对齐
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366588
Chang-Bin Huang, Timothy Apasiba Abeo, Xiang-jun Shen
{"title":"Manifold Alignment with Multi-graph Embedding","authors":"Chang-Bin Huang, Timothy Apasiba Abeo, Xiang-jun Shen","doi":"10.1145/3338533.3366588","DOIUrl":"https://doi.org/10.1145/3338533.3366588","url":null,"abstract":"In this paper, a novel manifold alignment approach via multi-graph embedding (MA-MGE) is proposed. Different from the traditional manifold alignment algorithms that use a single graph to describe the latent manifold structure of each dataset, our approach utilizes multiple graphs for modeling multiple local manifolds in multi-view data alignment. Therefore a composite manifold representation with complete and more useful information is obtained from each dataset through a dynamic reconstruction of multiple graphs. Experimental results on Protein and Face-10 datasets demonstrate that the mapping coordinates of the proposed method provide better alignment performance compared to the state-of-the-art methods, such as semi-supervised manifold alignment (SS-MA), manifold alignment using Procrustes analysis (PAMA) and manifold alignment without correspondence (UNMA).","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RSC-DGS: Fusion of RGB and NIR Images Using Robust Spectral Consistency and Dynamic Gradient Sparsity RSC-DGS:基于鲁棒光谱一致性和动态梯度稀疏性的RGB和NIR图像融合
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3368261
Shengtao Yu, Cheolkon Jung, Kailong Zhou, Chen Su
{"title":"RSC-DGS: Fusion of RGB and NIR Images Using Robust Spectral Consistency and Dynamic Gradient Sparsity","authors":"Shengtao Yu, Cheolkon Jung, Kailong Zhou, Chen Su","doi":"10.1145/3338533.3368261","DOIUrl":"https://doi.org/10.1145/3338533.3368261","url":null,"abstract":"Color (RGB) images captured under low light condition contain much noise with loss of textures. Since near-infrared (NIR) images are robust to noise with clear textures even in low light condition, they can be used to enhance low light RGB images by image fusion. In this paper, we propose fusion of RGB and NIR images using robust spectral consistency (RSC) and dynamic gradient sparsity (DGS), called RSC-DGS. We build the RSC model based on a robust error function to remove noise and preserve color/spectral consistency. We construct the DGS model based on vectorial total variation minimization that uses the NIR image as the reference image. The DGS model transfers clear textures of the NIR image to the fusion result and successfully preserves cross-channel interdependency of the RGB image. We use alternating direction method of multipliers (ADMM) for efficiency to solve the proposed RSC-DGS fusion. Experimental results confirm that the proposed method effectively preserves color/spectral consistency and textures in fusion results while successfully removing noise with high computational efficiency.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dense Attention Network for Facial Expression Recognition in the Wild 面向野外面部表情识别的密集注意网络
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366568
Cong Wang, K. Lu, Jian Xue, Yanfu Yan
{"title":"Dense Attention Network for Facial Expression Recognition in the Wild","authors":"Cong Wang, K. Lu, Jian Xue, Yanfu Yan","doi":"10.1145/3338533.3366568","DOIUrl":"https://doi.org/10.1145/3338533.3366568","url":null,"abstract":"Recognizing facial expression is significant for human-computer interaction system and other applications. A certain number of facial expression datasets have been published in recent decades and helped with the improvements for emotion classification algorithms. However, recognition of the realistic expressions in the wild is still challenging because of uncontrolled lighting, brightness, pose, occlusion, etc. In this paper, we propose an attention mechanism based module which can help the network focus on the emotion-related locations. Furthermore, we produce two network structures named DenseCANet and DenseSANet by using the attention modules based on the backbone of DenseNet. Then these two networks and original DenseNet are trained on wild dataset AffectNet and lab-controlled dataset CK+. Experimental results show that the DenseSANet has improved the performance on both datasets comparing with the state-of-the-art methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval 用于图像检索的注意力感知特征金字塔序数哈希
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366598
Xie Sun, Lu Jin, Zechao Li
{"title":"Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval","authors":"Xie Sun, Lu Jin, Zechao Li","doi":"10.1145/3338533.3366598","DOIUrl":"https://doi.org/10.1145/3338533.3366598","url":null,"abstract":"Due to the effectiveness of representation learning, deep hashing methods have attracted increasing attention in image retrieval. However, most existing deep hashing methods merely encode the raw information of the last layer for hash learning, which result in the following deficiencies: (1) the useful information from the preceding-layer is not fully exploited; (2) the local salient information of the image is neglected. To this end, we propose a novel deep hashing method, called Attention-Aware Feature Pyramid Ordinal Hashing (AFPH), which explores both the visual structure information and semantic information from different convolutional layers. Specifically, two feature pyramids based on spatial and channel attention are well constructed to capture the local salient structure from multiple scales. Moreover, a multi-scale feature fusion strategy is proposed to aggregate the feature maps from multi-level pyramidal layers to generate the discriminative feature for ranking-based hashing. The experimental results conducted on two widely-used image retrieval datasets demonstrate the superiority of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126535681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
NRQQA NRQQA
Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366563
Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang
{"title":"NRQQA","authors":"Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang","doi":"10.1145/3338533.3366563","DOIUrl":"https://doi.org/10.1145/3338533.3366563","url":null,"abstract":"Image stitching technology has been widely used in immersive applications, such as 3D modeling, VR and AR. The quality of stitching results is crucial. At present, the objective quality assessment methods of stitched images are mainly based on the availability of ground truth (i.e., Full-Reference). However, in most cases, ground truth is unavailable. In this paper, a no-reference quality assessment metric specifically designed for stitched images is proposed. We first find out the corresponding parts of source images in the stitched image. Then, the isolated points and the outer points generated by spherical projection are eliminated. After that, we take advantage of the bounding rectangle of stitching seams to locate the position of overlapping regions in the stitched image. Finally, the assessment of overlapping regions is taken as the final scoring result. Extensive experiments have shown that our scores are consistent with human vision. Even for the nuances that cannot be distinguished by human eyes, our proposed metric is also effective.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126192781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信