{"title":"A joint deep-network-based image restoration algorithm for multi-degradations","authors":"Xu Sun, Xiaoguang Li, L. Zhuo, K. Lam, Jiafeng Li","doi":"10.1109/ICME.2017.8019361","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019361","url":null,"abstract":"In the procedures of image acquisition, compression, and transmission, captured images usually suffer from various degradations, such as low-resolution and compression distortion. Although there have been a lot of research done on image restoration, they usually aim to deal with a single degraded factor, ignoring the correlation of different degradations. To establish a restoration framework for multiple degradations, a joint deep-network-based image restoration algorithm is proposed in this paper. The proposed convolutional neural network is composed of two stages. Firstly, a de-blocking subnet is constructed, using two cascaded neural network. Then, super-resolution is carried out by a 20-layer very deep network with skipping links. Cascading these two stages forms a novel deep network. Experimental results on the Set5, Setl4 and BSD100 benchmarks demonstrate that the proposed method can achieve better results, in terms of both the subjective and objective performances.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network-assisted strategy for dash over CCN","authors":"Rihab Jmal, G. Simon, L. Chaari","doi":"10.1109/ICME.2017.8019482","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019482","url":null,"abstract":"MPEG Dynamic Adaptive Streaming over HTTP (DASH) has become the most used technology of video delivery nowadays. Considering the video segment more important than its location, new internet architecture such as Content Centric Network (CCN) is proposed to enhance DASH streaming. This architecture with its in-network caching salient feature improves Quality of Experience (QoE) from consumer side. It reduces delays and increases throughput by providing the requested video segment from a near point to the end user. However, there are oscillations issues induced by caching with DASH. In this paper, we propose a new Network-Assisted Strategy (NAS) based-on traffic shaping and request prediction with the aim of improving DASH flows investigating new internet architecture CCN.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129898117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HEVC-EPIC: Edge-preserving interpolation of coded HEVC motion with applications to framerate upsampling","authors":"Dominic Rüfenacht, D. Taubman","doi":"10.1109/ICME.2017.8019515","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019515","url":null,"abstract":"We propose a method to obtain a high quality motion field from decoded HEVC motion. We use the block motion vectors to establish a sparse set of correspondences, and then employ an affine, edge-preserving interpolation of correspondences (EPIC) to obtain a dense optical flow. Experimental results on a variety of sequences coded at a range of QP values show that the proposed HEVC-EPIC is over five times as fast as the original EPIC flow, which uses a sophisticated correspondence estimator, while only slightly decreasing the flow accuracy. The proposed work opens the door to leveraging HEVC motion into video enhancement and analysis methods. To provide some evidence of what can be achieved, we show that when used as input to a framerate upsampling scheme, the average Y-PSNR of the interpolated frames obtained using HEVC-EPIC motion is slightly lower (0.2dB) than when original EPIC flow is used, with hardly any visible differences.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan
{"title":"Fine-grained image recognition via weakly supervised click data guided bilinear CNN model","authors":"Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan","doi":"10.1109/ICME.2017.8019407","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019407","url":null,"abstract":"Bilinear convolutional neural networks (BCNN) model, the state-of-the-art in fine-grained image recognition, fails in distinguishing the categories with subtle visual differences. We design a novel BCNN model guided by user click data (C-BCNN) to improve the performance via capturing both the visual and semantical content in images. Specially, to deal with the heavy noise in large-scale click data, we propose a weakly supervised learning approach to learn the C-BCNN, namely W-C-BCNN. It can automatically weight the training images based on their reliability. Extensive experiments are conducted on the public Clickture-Dog dataset. It shows that: (1) integrating CNN with click feature largely improves the performance; (2) both the click data and visual consistency can help to model image reliability. Moreover, the method can be easily customized to medical image recognition. Our model performs much better than conventional BCNN models on both the Clickture-Dog and medical image dataset.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduced reference stereoscopic image quality assessment based on entropy of classified primitives","authors":"Zhaolin Wan, Feng Qi, Yutao Liu, Debin Zhao","doi":"10.1109/ICME.2017.8019337","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019337","url":null,"abstract":"Stereoscopic vision is a complex system which receives and integrates perceptual information from both monocular and binocular cues. In this paper, a novel reduced-reference stereoscopic image quality assessment scheme is proposed, based on the visual perceptual information measured by entropy of classified primitives (EoCP) and mutual information of classified primitives (MIoCP), named as DCprimary, sketch and texture primitives respectively, which is in accordance with the hierarchical progressive process of human visual perception. Specifically, EoCP of each-view image are calculated as monocular cue, and MIoCP between two-view images is derived as binocular cue. The Maximum (MAX) mechanism is applied to determine the perceptual information. The perceptual information differences between the original and distorted images are used to predict the stereoscopic image quality by support vector regression (SVR). Experimental results on LIVE phase II asymmetric database validate the proposed metric achieves significantly higher consistency with subjective ratings and outperforms state-of-the-art stereoscopic image quality assessment methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130599236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-ordered spatial-temporal interest points for human action classification","authors":"Mengyuan Liu, Chen Chen, Hong Liu","doi":"10.1109/ICME.2017.8019477","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019477","url":null,"abstract":"Human action classification, which is vital for content-based video retrieval and human-machine interaction, finds problem in distinguishing similar actions. Previous works typically detect spatial-temporal interest points (STIPs) from action sequences and then adopt bag-of-visual words (BoVW) model to describe actions as numerical statistics of STIPs. Despite the robustness of BoVW, this model ignores the spatial-temporal layout of STIPs, leading to misclassification among different types of actions with similar numerical statistics of STIPs. Motivated by this, a time-ordered feature is designed to describe the temporal distribution of STIPs, which contains complementary structural information to traditional BoVW model. Moreover, a temporal refinement method is used to eliminate intra-variations among time-ordered features caused by performers' habits. Then a time-ordered BoVW model is built to represent actions, which encodes both numerical statistics and temporal distribution of STIPs. Extensive experiments on three challenging datasets, i.e., KTH, Rochster and UT-Interaction, validate the effectiveness of our method in distinguishing similar actions.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131959069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval","authors":"Yiling Wu, Shuhui Wang, W. Zhang, Qingming Huang","doi":"10.1109/ICME.2017.8019528","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019528","url":null,"abstract":"This paper presents a Cross-Modal Online Low-Rank Similarity function learning method (CMOLRS) for cross-modal retrieval, which learns a low-rank bilinear similarity measure on data from different modalities. CMOLRS models the cross-modal relations by relative similarities on a set of training data triplets and formulates the relative relations as convex hinge loss functions. By adapting the margin of hinge loss using information from feature space and label space for each triplet, CMOLRS effectively captures the multi-level semantic correlation among cross-modal data. The similarity function is learned by online learning in the manifold of low-rank matrices, thus good scalability is gained when processing large scale datasets. Extensive experiments are conducted on three public datasets. Comparisons with the state-of-the-art methods show the effectiveness and efficiency of our approach.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131981903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive attention fusion network for visual question answering","authors":"Geonmo Gu, S. T. Kim, Yong Man Ro","doi":"10.1109/ICME.2017.8019540","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019540","url":null,"abstract":"Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132147892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to generate video object segment proposals","authors":"Jianwu Li, Tianfei Zhou, Yao Lu","doi":"10.1109/ICME.2017.8019535","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019535","url":null,"abstract":"This paper proposes a fully automatic pipeline to generate accurate object segment proposals in realistic videos. Our approach first detects generic object proposals for all video frames and then learns to rank them using a Convolutional Neural Networks (CNN) descriptor built on appearance and motion cues. The ambiguity of the proposal set can be reduced while the quality can be retained as highly as possible Next, high-scoring proposals are greedily tracked over the entire sequence into distinct tracklets. Observing that the proposal tracklet set at this stage is noisy and redundant, we perform a tracklet selection scheme to suppress the highly overlapped tracklets, and detect occlusions based on appearance and location information. Finally, we exploit holistic appearance cues for refinement of video segment proposals to obtain pixel-accurate segmentation. Our method is evaluated on two video segmentation datasets i.e. SegTrack v1 and FBMS-59 and achieves competitive results in comparison with other state-of-the-art methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-negative dictionary learning with pairwise partial similarity constraint","authors":"Xu Zhou, Pak Lun Kevin Ding, Baoxin Li","doi":"10.1109/ICME.2017.8019392","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019392","url":null,"abstract":"Discriminative dictionary learning has been widely used in many applications such as face retrieval / recognition and image classification, where the labels of the training data are utilized to improve the discriminative power of the learned dictionary. This paper deals with a new problem of learning a dictionary for associating pairs of images in applications such as face image retrieval. Compared with a typical supervised learning task, in this case the labeling information is very limited (e.g. only some training pairs are known to be associated). Further, associated pairs may be considered similar only after excluding certain regions (e.g. sunglasses in a face image). We formulate a dictionary learning problem under these considerations and design an algorithm to solve the problem. We also provide a proof for the convergence of the algorithm. Experiments and results suggest that the proposed method is advantageous over common baselines.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123174964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}