{"title":"Color Image Noise Covariance Estimation with Cross-Channel Image Noise Modeling","authors":"Li Dong, Jiantao Zhou, Tao Dai","doi":"10.1109/ICME.2018.8486558","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486558","url":null,"abstract":"Noise estimation is crucial in many image processing tasks such as denoising. Most of the existing noise estimation methods are specially developed for grayscale images. For color images, these methods simply handle each color channel independently, without considering the correlation across channels. In this work, we propose a multivariate Gaussian approach to model the noise in color images, in which we explicitly consider the inter-dependence among color channels. We design a practical method for estimating the noise covariance matrix within the proposed model. Specifically, a patch selection scheme is first introduced to select weakly textured patches through thresholding the texture strength indicators. Noticing that the patch selection actually depends on the unknown noise covariance, we present an iterative noise covariance estimation algorithm, where the patch selection and the covariance estimation are conducted alternately. Experimental results show that our method can effectively estimate the noise covariance. The practical usage is demonstrated with color image denoising.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125340292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mural2Sketch: A Combined Line Drawing Generation Method for Ancient Mural Painting","authors":"Di Sun, Jiawan Zhang, Gang Pan, Rui Zhan","doi":"10.1109/ICME.2018.8486504","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486504","url":null,"abstract":"Line drawing is a unique drawing technique developed over millennia in China. Since ancient murals have line drawings of beautiful form and vast history, it is incredibly important to digitally curate these pieces. In this paper, we propose a line drawing generation method named Mural2Sketch (MS) for ancient mural paintings. MS first utilizes heuristic routing to detect the outer edge of a stroke, and then high frequency enhancement filtering is used to extract the information inside the stroke. A complete stroke is then generated by collaborative representation. MS is capable of outputting the result in vector form and producing different artistic styles. Experimental results show that our method is simple but effective. This research has the potential to support digital mural copying, mural protection, as well as related cultural research and application.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127914825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Yi-Hsuan Yang, H. Wang, Hsiao-Rong Tyan, H. Liao
{"title":"Seethevoice: Learning from Music to Visual Storytelling of Shots","authors":"Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Yi-Hsuan Yang, H. Wang, Hsiao-Rong Tyan, H. Liao","doi":"10.1109/ICME.2018.8486496","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486496","url":null,"abstract":"Types of shots in the language of film are considered the key elements used by a director for visual storytelling. In filming a musical performance, manipulating shots could stimulate desired effects such as manifesting the emotion or deepening the atmosphere. However, while the visual storytelling technique is often employed in creating professional recordings of a live concert, audience recordings of the same event often lack such sophisticated manipulations. Thus it would be useful to have a versatile system that can perform video mashup to create a refined video from such amateur clips. To this end, we propose to translate the music into a near-professional shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. Our method introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. The results from objective and subjective experiments demonstrate that MF-RNNs with film-language can generate an appealing shot sequence with better viewing experience.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127767572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Multimodal Video Hyperlinking with Visual Aggregation","authors":"Mateusz Budnik, Mikail Demirdelen, G. Gravier","doi":"10.1109/icme.2018.8486549","DOIUrl":"https://doi.org/10.1109/icme.2018.8486549","url":null,"abstract":"Video hyperlinking offers a way to explore a video collection, making use of links that connect segments having related content. Hyperlinking systems thus seek to automatically create links by connecting given anchor segments to relevant targets within the collection. In this paper, we further investigate multimodal representations of video segments in a hyper-linking system based on bidirectional deep neural networks, which achieved state-of-the-art results in the TRECVid 2016 evaluation. A systematic study of different input representations is done with a focus on the aggregation of the representation of multiple keyframes. This includes, in particular, the use of memory vectors as a novel aggregation technique, which provides a significant improvement over other aggregation methods on the final hyperlinking task. Additionally, the use of metadata is investigated leading to increased performance and lower computational requirements for the system.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121653181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Edge Detection and Image Segmentation on Encrypted Image with Homomorphic Encryption and Garbled Circuit","authors":"Delin Chen, Wenhao Chen, Jian Chen, Peijia Zheng, Jiwu Huang","doi":"10.1109/ICME.2018.8486551","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486551","url":null,"abstract":"Edge detection is one of the most important topics of image processing. In the scenario of cloud computing, performing edge detection may also consider privacy protection. In this paper, we propose an edge detection and image segmentation scheme on an encrypted image with Sobel edge detector. We implement Gaussian filtering and Sobel operator on the image in the encrypted domain with homomorphic property. By implementing an adaptive threshold decision algorithm in the encrypted domain, we obtain a threshold determined by the image distribution. With the technique of garbled circuit, we perform comparison in the encrypted domain and obtain the edge of the image without decrypting the image in advanced. We then propose an image segmentation scheme on the encrypted image based on the detected edges. Our experiments demonstrate the viability and effectiveness of the proposed encrypted image edge detection and segmentation.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126680666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Xu, Jingkuan Song, Huimin Lu, Li He, Yang Yang, Fumin Shen
{"title":"Dual Learning for Visual Question Generation","authors":"Xing Xu, Jingkuan Song, Huimin Lu, Li He, Yang Yang, Fumin Shen","doi":"10.1109/ICME.2018.8486475","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486475","url":null,"abstract":"Recently, automatic answering of visually related questions (VQA) has gained a lot of attention in computer vision community. However, there is little work on automatically generating questions for images (VQG). Actually, VQG itself closes the loop to question-answering and diverse questions, which is useful to the research on VQA. Motivated by the assumption that learning to answer questions may boost the question generation, in this paper, we introduce the VQA task as the complementary of our primary VQG task, and propose a novel model that uses dual learning framework to jointly learn the dual tasks. In the framework, we devise an agent for VQG and VQA with pre-trained models respectively, and the learning tasks of the two agents form a closed loop, whose objectives are optimized together to guide each other via a reinforcement learning process. Specific rewards for each task are designed to update the models of the agents with policy gradient method. The relation of these two tasks can be exploited to further improve the performance of the primary VQG task. Extensive experiments conducted on two large-scale datasets show that the proposed method is capable to generate grounded visual questions of sufficient coverage and outperforms previous VQG methods on standard measures.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122288454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuzhuang Zhou, Zheng Zhang, Zeqiang Wei, Kai Jin, Min Xu
{"title":"Consistency-Exclusivity Regularized Deep Metric Learning for General Kinship Verification","authors":"Xiuzhuang Zhou, Zheng Zhang, Zeqiang Wei, Kai Jin, Min Xu","doi":"10.1109/ICME.2018.8486590","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486590","url":null,"abstract":"While encouraging results have been made so far to advance kinship verification by using facial images, learning a robust genetic similarity measure remains challenging, especially in the setting of general kinship verification, wherein the gender labels of the test samples are unknown in advance. In this paper we present a deep metric learning method with a carefully designed two-stream neural network to jointly learn a pair of deep embeddings for parent-child images. In particular, the deep embeddings are first modeled to explicitly consist of the common and individual components, and then two additional constraints are introduced in deep metric learning: 1) value-aware consistency on the common components, and 2) position-aware exclusivity on the individual components. The proposed hierarchical consistency-exclusivity regularization enables our deep metric learning to harness the sharable and complementary patterns inherent in parent-child images. Empirically, we show improved performance over state of the art metric learning solutions to general kinship verification on two benchmarks.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Representation for Color Image Based on Geometric Algebra","authors":"Rui Wang, Yujie Wu, Miaomiao Shen, W. Cao","doi":"10.1109/ICME.2018.8486524","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486524","url":null,"abstract":"Existing sparse representation models represent RGB channels separately without thinking about the relationship color channels, which lose some color structures inevitably. In this paper, we introduce a novel sparse representation model for color image based on geometric algebra (GA) theory and its corresponding dictionary learning algorithm, namely K-GASVD is proposed. The model represents the color image as a multivector with the spatial and spectral information in GA space, providing a kind of vectorial representation for the inherent color structures rather than a scalar representation via current sparse image models. The proposed sparse model is validated in the applications of color image denoising and reconstruction. The experimental results demonstrate that our sparse image model avoids the hue bias phenomenon successfully and retained the color structures completely. It shows its potential as a general and powerful tool in various applications of color image analysis.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132462164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixuan Ban, Lan Xie, Zhimin Xu, Xinggong Zhang, Zongming Guo, Yue Wang
{"title":"CUB360: Exploiting Cross-Users Behaviors for Viewport Prediction in 360 Video Adaptive Streaming","authors":"Yixuan Ban, Lan Xie, Zhimin Xu, Xinggong Zhang, Zongming Guo, Yue Wang","doi":"10.1109/ICME.2018.8486606","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486606","url":null,"abstract":"To ensure 360-degree video's continuous playback and reduce the bandwidth waste, predicting user's future fixation is indispensable. However, existing methods concentrate either on user's motion information or content information. None of them consider users watching behaviors' inconsistency which embodies user's attention distribution more explicitly. So in this paper, we exploit Cross-Users Behaviors for viewport prediction in 360-degree video adaptive streaming, namely CUB360, trying to concurrently consider user's personalized information and cross-users behaviors information to predict future viewport. Besides, we use a QoE-driven framework to optimize existing video streaming approaches and propose a general algorithm aiming at solving the NP problem at a low complexity. Extensive experimental results over real datasets demonstrate that compared with traditional adaptive streaming method, our proposal can significantly boost the prediction accuracy by 20.2% absolutely and 48.1 % relatively. Besides, the mean quality can get 30.28% gain while quality variance can be reduced by 29.89%.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132407404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Mao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma
{"title":"Enhanced Image Decoding via Edge-Preserving Generative Adversarial Networks","authors":"Qi Mao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma","doi":"10.1109/ICME.2018.8486495","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486495","url":null,"abstract":"Lossy image compression usually introduces undesired compression artifacts, such as blocking, ringing and blurry effect{###} S, especially in low bit rate coding scenarios. Although many algorithms have been proposed to reduce these compression artifacts, most of them are based on image local smoothness prior, which usually leads to over-smoothing around the areas with distinct structures, e.g., edges and textures. In this paper, we propose a novel framework to enhance the perceptual quality of decoded images by well preserving the edge structures and predicting visually pleasing textures. Firstly, we propose an edge-preserving generative adversarial network (EP-GAN) to achieve edge restoration and texture generation simultaneously. Then, we elaborately design an edge fidelity regularization term to guide our network, which jointly utilizes the signal fidelity, feature fidelity and adversarial constraint to reconstruct high quality decoded images. Experimental results demonstrate that the proposed EP-GAN is able to efficiently enhance decoded images at low bit rate and reconstruct more perceptually pleasing images with abundant textures and sharp edges.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}