Longhua Sun, Jin Wang, Yunhui Shi, Qing Zhu, Baocai Yin
{"title":"Surface Normal Data Guided Depth Recovery with Graph Laplacian Regularization","authors":"Longhua Sun, Jin Wang, Yunhui Shi, Qing Zhu, Baocai Yin","doi":"10.1145/3338533.3366582","DOIUrl":"https://doi.org/10.1145/3338533.3366582","url":null,"abstract":"High-quality depth information has been increasingly used in many real-world multimedia applications in recent years. Due to the limitation of depth sensor and sensing technology, actually, the captured depth map usually has low resolution and black holes. In this paper, inspired by the geometric relationship between surface normal of a 3D scene and their distance from camera, we discover that surface normal map can provide more spatial geometric constraints for depth map reconstruction, as depth map is a special image with spatial information, which we called 2.5D image. To exploit this property, we propose a novel surface normal data guided depth recovery method, which uses surface normal data and observed depth value to estimate missing or interpolated depth values. Moreover, to preserve the inherent piecewise smooth characteristic of depth maps, graph Laplacian prior is applied to regularize the inverse problem of depth maps recovery and a graph Laplacian regularizer(GLR) is proposed. Finally, the spatial geometric constraint and graph Laplacian regularization are integrated into a unified optimization framework, which can be efficiently solved by conjugate gradient(CG). Extensive quantitative and qualitative evaluations compared with state-of-the-art schemes show the effectiveness and superiority of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122328075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Recovery from Multi-Spectral NIR Images Using Gray Information","authors":"Qingtao Fu, Cheolkon Jung, Chen Su","doi":"10.1145/3338533.3368259","DOIUrl":"https://doi.org/10.1145/3338533.3368259","url":null,"abstract":"Converting near-infrared (NIR) images into color images is a challenging task due to the different characteristics of visible and NIR images. Most methods of generating color images directly from a single NIR image are limited by the scene and object categories. In this paper, we propose a novel approach to recovering object colors from multi-spectral NIR images using gray information. The multi-spectral NIR images are obtained by a 2-CCD NIR/RGB camera with narrow NIR bandpass filters of different wavelengths. The proposed approach is based on multi-spectral NIR images to estimate a conversion matrix for NIR to RGB conversion. In addition to the multi-spectral NIR images, a corresponding gray image is used as a complementary channel to estimate the conversion matrix for NIR to RGB color conversion. The conversion matrix is obtained from the ColorChecker's 24 color blocks using polynomial regression and applied to real-world scene NIR images for color recovery. The proposed approach has been evaluated by a large number of real-world scene images, and the results show that the proposed approach is simple yet effective for recovering color of objects.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"48 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130669564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalizing Rate Control Strategies for Realtime Video Streaming via Learning from Deep Learning","authors":"Tianchi Huang, Ruixiao Zhang, Chenglei Wu, Xin Yao, Chao Zhou, Bing Yu, Lifeng Sun","doi":"10.1145/3338533.3366606","DOIUrl":"https://doi.org/10.1145/3338533.3366606","url":null,"abstract":"The leading learning-based rate control method, i.e., QARC, achieves state-of-the-art performances but fails to interpret the fundamental principles, and thus lacks the abilities to further improve itself efficiently. In this paper, we propose EQARC (Explainable QARC) via reconstructing QARC's modules, aiming to demystify how QARC works. In details, we first utilize a novel hybrid attention-based CNN+GRU model to re-characterize the original quality prediction network and reasonably replace the QARC's 1D-CNN layers with 2D-CNN layers. Using trace-driven experiment, we demonstrate the superiority of EQARC over existing state-of-the-art approaches. Next, we collect several useful information from each interpretable modules and learn the insight of EQARC. Following this step, we further propose AQARC (Advanced QARC), which is the light-weighted version of QARC. Experimental results show that AQARC achieves the same performances as the QARC with an overhead reduction of 90%. In short, through learning from deep learning, we generalize a rate control method which can both reach high performance and reduce computation cost.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133029202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Dark Region Detail Enhancement Method for Low-light Images","authors":"Wengang Cheng, Caiyun Guo, Haitao Hu","doi":"10.1145/3338533.3366584","DOIUrl":"https://doi.org/10.1145/3338533.3366584","url":null,"abstract":"The images captured in low-light conditions are often of poor visual quality as most of details in dark regions buried. Although some advanced low-light image enhancement methods could lighten an image and its dark regions, they still cannot reveal the details in dark regions very well. This paper presents an adaptive dark region detail enhancement method for low-light images. As our method is based on the Retinex theory, we first formulate the Retinex-based low-light image enhancement problem into a Bayesian optimization framework. Then, a dark region prior is proposed and an adaptive gradient amplification strategy is designed to incorporate this prior into the illumination estimation. The dark region prior, together with the widely used spatial smooth and structure priors, leads to a dark region and structure-aware smoothness regularization term for illumination optimization. We provide a solver to this optimization and get final enhanced results after post processing. Experiments demonstrate that our method can obtain good enhancement results with better dark region details compared to several state-of-the-art methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Gan, Zhixin Ma, Yu Lu, Xuemeng Song, Liqiang Nie
{"title":"Learn to Gesture: Let Your Body Speak","authors":"Tian Gan, Zhixin Ma, Yu Lu, Xuemeng Song, Liqiang Nie","doi":"10.1145/3338533.3366602","DOIUrl":"https://doi.org/10.1145/3338533.3366602","url":null,"abstract":"Presentation is one of the most important and vivid methods to deliver information to audience. Apart from the content of presentation, how the speaker behaves during presentation makes a big difference. In other words, gestures, as part of the visual perception and synchronized with verbal information, express some subtle information that the voice or words alone cannot deliver. One of the most effective ways to improve presentation is to practice through feedback/suggestions by an expert. However, hiring human experts is expensive thus impractical most of the time. Towards this end, we propose a speech to gesture network (POSE) to generate exemplary body language given a vocal behavior speech as input. Specifically, we build an \"expert\" Speech-Gesture database based on the featured TED talk videos, and design a two-layer attentive recurrent encoder-decoder network to learn the translation from speech to gesture, as well as the hierarchical structure within gestures. Lastly, given a speech audio sequence, the appropriate gesture will be generated and visualized for a more effective communication. Both objective and subjective validation show the effectiveness of our proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131121919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenqian Zhu, R. Hu, Zhongyuan Wang, Dengshi Li, Xiyue Gao
{"title":"Deep Structural Feature Learning: Re-Identification of simailar vehicles In Structure-Aware Map Space","authors":"Wenqian Zhu, R. Hu, Zhongyuan Wang, Dengshi Li, Xiyue Gao","doi":"10.1145/3338533.3366585","DOIUrl":"https://doi.org/10.1145/3338533.3366585","url":null,"abstract":"Vehicle re-identification (re-ID) has received more attention in recent years as a significant work, making huge contribution to the intelligent video surveillance. The complex intra-class and inter-class variation of vehicle images bring huge challenges for vehicle re-ID, especially for the similar vehicle re-ID. In this paper we focus on an interesting and challenging problem, vehicle re-ID of the same/similar model. Previous works mainly focus on extracting global features using deep models, ignoring the individual loa-cal regions in vehicle front window, such as decorations and stickers attached to the windshield, that can be more discriminative for vehicle re-ID. Instead of directly embedding these regions to learn their features, we propose a Regional Structure-Aware model (RSA) to learn structure-aware cues with the position distribution of individual local regions in vehicle front window area, constructing a FW structural map space. In this map sapce, deep models are able to learn more robust and discriminative spatial structure-aware features to improve the performance for vehicle re-ID of the same/similar model. We evaluate our method on a large-scale vehicle re-ID dataset Vehicle-1M. The experimental results show that our method can achieve promising performance and outperforms several recent state-of-the-art approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"102 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Vision in Multimedia","authors":"H. Hang","doi":"10.1145/3379197","DOIUrl":"https://doi.org/10.1145/3379197","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":" August","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113946847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive Event Storyline Generation from Microblogs","authors":"Wenjin Sun, Yuhang Wang, Yuqi Gao, Zesong Li, J. Sang, Jian Yu","doi":"10.1145/3338533.3366601","DOIUrl":"https://doi.org/10.1145/3338533.3366601","url":null,"abstract":"Microblogging data contains a wealth of information of trending events and has gained increased attention among users, organizations, and research scholars for social media mining in different disciplines. Event storyline generation is one typical task of social media mining, whose goal is to extract the development stages with associated description of events. Existing storyline generation methods either generate storyline with less integrity or fail to guarantee the coherence between the discovered stages. Secondly, there are no scientific method to evaluate the quality of the storyline. In this paper, we propose a comprehensive storyline generation framework to address the above disadvantages. Given Microblogging data related to the specified event, we first propose Hot-Word-Based stage detection algorithm to identify the potential stages of event, which can effectively avoid ignoring important stages and preventing inconsistent sequence between stages. Community detection algorithm is applied then to select representative data for each stage. Finally, we conduct graph optimization algorithm to generate the logically coherent storylines of the event. We also introduce a new evaluation metric, SLEU, to emphasize the importance of the integrity and coherence of the generated storyline. Extensive experiments on real-world Chinese microblogging data demonstrate the effectiveness of the proposed methods in each module and the overall framework.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Prior Guided Face Inpainting","authors":"Zeyang Zhang, Xiaobo Zhou, Shengjie Zhao, Xiaoyan Zhang","doi":"10.1145/3338533.3366587","DOIUrl":"https://doi.org/10.1145/3338533.3366587","url":null,"abstract":"Face inpainting is a sub-task of image inpainting designed to repair broken or occluded incomplete portraits. Due to the high complexity of face image details, inpainting on the face is more difficult. At present, face-related tasks often draw on excellent methods from face recognition and face detection, using multitasking to boost its effect. Therefore, this paper proposes to add the face prior knowledge to the existing advanced inpainting model, combined with perceptual loss and SSIM loss to improve the model repair efficiency. A new face inpainting process and algorithm is implemented, and the repair effect is improved.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116252144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}