Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu
{"title":"Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition","authors":"Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu","doi":"10.1109/ICME.2018.8486486","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486486","url":null,"abstract":"This paper presents a new framework for action recognition with multi-modal data. A skeleton-indexed feature learning procedure is developed to further exploit the detailed local features from RGB and optical flow videos. In particular, the proposed framework is built based on a deep Convolutional Network (ConvNet) and a Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM). A skeleton-indexed transform layer is designed to automatically extract visual features around key joints, and a part-aggregated pooling is developed to uniformly regulate the visual features from different body parts and actors. Besides, several fusion schemes are explored to take advantage of multi-modal data. The proposed deep architecture is end-to-end trainable and can better incorporate different modalities to learn effective feature representations. Quantitative experiment results on two datasets, the NTU RGB+D dataset and the MSR dataset, demonstrate the excellent performance of our scheme over other state-of-the-arts. To our knowledge, the performance obtained by the proposed framework is currently the best on the challenging NTU RGB+D dataset.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116914035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Contrast Enhancement via Graph-Based Cartoon-Texture Decomposition","authors":"Deming Zhai, Xianming Lu, Xiangyang Ji, Yuanchao Bai, Debin Zhao, Wen Gao","doi":"10.1109/ICME.2018.8486436","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486436","url":null,"abstract":"In this paper, we propose a robust contrast enhancement algorithm based on cartoon and texture layer decomposition. Specifically, the cartoon layer is expected to be generally smoothing but with sharp edges at the foreground and background boundaries, for which we propose a quadratic form of graph total variation (GTV) as the prior to promote signal smoothness along graph structure. For the texture layer, a reweighted GTV is tailored to remove noises while preserving true image details. Finally, an optimization objective function is formulated, which casts image decomposition, contrast enhancement and noise reduction into a unified framework. We propose an efficient algorithm to solve it. Experimental results show that our generated images outperform state-of-the-art schemes noticeably in subjective quality evaluation.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"175 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120834987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Reinforcement Network for Image Classification","authors":"Bingxu Lu, Q. Hu, Yijing Hui, Quan Wen, Min Li","doi":"10.1109/ICME.2018.8486608","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486608","url":null,"abstract":"Deep Learning has attracted much attention these years as it produces fabulous performance in various applications. Most researchers have mainly focused on improving and optimizing the network structure, e.g., deeper and deeper networks are constructed to extract high-level features from raw data. In this paper, we propose a two-wing deep convolutional network, called Feature Reinforcement Networks (FRN). One wing acts as the traditional operation in VGG, ResNet and DenseNet, while the other wing called feature reinforcement block (FRB) also conducts layer-wise convolution operations which share the convolution parameters of the former layer. Then, Relu function is employed in FRB to rectify the feature maps except the output layer. The outputs of these wings are integrated as the input of the next convolution layer. It is confirmed that the proposed FRN is more sensitive to the informative features. Our experiments on a few multimedia datasets prove FRN outperforms the original deep neural networks.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125790176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Risheng Liu, Zhiying Jiang, Xin Fan, Haojie Li, Zhongxuan Luo
{"title":"Single Image Layer Separation via Deep Admm Unrolling","authors":"Risheng Liu, Zhiying Jiang, Xin Fan, Haojie Li, Zhongxuan Luo","doi":"10.1109/ICME.2018.8486511","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486511","url":null,"abstract":"Single image layer separation aims to divide the observed image into two independent components according to special task requirements and has been widely used in many vision and multimedia applications. Because this task is fundamentally ill-posed, most existing approaches tend to design complex priors on the separated layers. However, the cost function with complex prior regularization is hard to optimize. The performance is also compromised by fixed iteration schemes and less data fitting ability. More importantly, it is also challenging to design a unified framework to separate image layers for different applications. To partially mitigate the above limitations, we develop a flexible optimization unrolling technique to incorporate deep architectures into iterations for adaptive image layer separation. Specifically, we first design a general energy model with implicit priors and adopt the widely used alternating direction method of multiplier (ADMM) to establish our basic iteration scheme. By unrolling with residual convolution architectures, we successfully obtain a simple, flexible, and data-dependent image separation method. Extensive experiments on the tasks of rain streak removal and reflection removal validate the effectiveness of our approach.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymmetric Block Based Compressive Sensing for Image Signals","authors":"Siwang Zhou, Shuzhen Xiang, Xingting Liu, Heng Li","doi":"10.1109/ICME.2018.8486517","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486517","url":null,"abstract":"Block based compressed sensing (BCS) is a novel framework in image signals sampling and recovery due to its advantages in terms of both low sampling burden and lightweight recovery complexity. In this paper, we propose a novel asymmetric BCS scheme to further improve the image recovery accuracy. In the sampling process, image blocks are partitioned into smaller sub-blocks, and those small sub-blocks are used to allocate sampling resources. In the recovery process, the small sub-blocks with similar feature information are assembled into virtual blocks with larger size, and the corresponding transforming coefficients are then more compressible. The proposed scheme improves the recovered images from the fairer resources allocation and much greater compressibility. The experimental results demonstrate that, compared to the existing BCS approaches, our proposed scheme has higher recovery quality, without increasing sampling and recovery complexity.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114520932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification","authors":"Long Chen, H. Ai, Zijie Zhuang, C. Shang","doi":"10.1109/ICME.2018.8486597","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486597","url":null,"abstract":"Online multi-object tracking is a fundamental problem in time-critical video analysis applications. A major challenge in the popular tracking-by-detection framework is how to associate unreliable detection results with existing tracks. In this paper, we propose to handle unreliable detection by collecting candidates from outputs of both detection and tracking. The intuition behind generating redundant candidates is that detection and tracks can complement each other in different scenarios. Detection results of high confidence prevent tracking drifts in the long term, and predictions of tracks can handle noisy detection caused by occlusion. In order to apply optimal selection from a considerable amount of candidates in real-time, we present a novel scoring function based on a fully convolutional neural network, that shares most computations on the entire image. Moreover, we adopt a deeply learned appearance representation, which is trained on large-scale person re-identification datasets, to improve the identification ability of our tracker. Extensive experiments show that our tracker achieves real-time and state-of-the-art performance on a widely used people tracking benchmark.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128240363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Index-Compatible Hashing for Fast Image Retrieval","authors":"Dayan Wu, Jing Liu, Bo Li, Weiping Wang","doi":"10.1109/ICME.2018.8486463","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486463","url":null,"abstract":"Deep hashing methods have achieved promising results for large-scale image retrieval recently. To accelerate the subsequent Hamming ranking process, the multi-index approach has been proposed to reduce the computations for the Hamming distance. However, the binary codes output by the previous deep hashing methods may not be optimally compatible with the multi-index approach. In this paper, we present a novel Deep Index-Compatible Hashing (DICH) method for fast image retrieval, which can learn similarity-preserving binary codes that are more compatible with the multi-index approach. With the learned binary codes, both the size of the intermediate result set produced by the multi-index approach and the number of the candidate images can be reduced, which can accelerate the Hamming ranking process. By taking advantage of the unique feature of DICH, we further propose a block-based ranking strategy to quickly rank the candidate images without calculating the Hamming distance. Extensive evaluations demonstrate that the proposed method can significantly reduce the retrieval time with almost no loss of retrieval accuracy.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128602743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-Saliency Detection via Hierarchical Consistency Measure","authors":"Yonghua Zhang, Liangkai Li, Runmin Cong, Xiaojie Guo, Hui Xu, Jiawan Zhang","doi":"10.1109/ICME.2018.8486603","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486603","url":null,"abstract":"Co-saliency detection is a newly emerging research topic in multimedia and computer vision, the goal of which is to extract common salient objects from multiple images. Effectively seeking the global consistency among multiple images is critical to the performance. To achieve the goal, this paper designs a novel model with consideration of a hierarchical consistency measure. Different from most existing co-saliency methods that only exploit common features (such as color and texture), this paper further utilizes the shape of object as another cue to evaluate the consistency among common salient objects. More specifically, for each involved image, an intra-image saliency map is firstly generated via a single image saliency detection algorithm. Having the intra-image map constructed, the consistency metrics at object level and superpixel level are designed to measure the corresponding relationship among multiple images and obtain the inter saliency result by considering multiple visual attention features and multiple constrains. Finally, the intra-image and inter-image saliency maps are fused to produce the final map. Experiments on benchmark datasets are conducted to demonstrate the effectiveness of our method, and reveal its advances over other state-of-the-art alternatives.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1964 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Based Transportation Modes Recognition Using Mobile Communication Quality","authors":"W. Kawakami, Kenji Kanai, Bo Wei, J. Katto","doi":"10.1109/ICME.2018.8486560","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486560","url":null,"abstract":"In order to recognize the transportation modes without any additional sensor devices, we propose a recognition method by using communication quality factors. In the proposed method, instead of Global Positioning System (GPS) and accelerometer sensors, we collect mobile TCP throughputs, Received Signal Strength Indicators (RSSIs), and cellular base station IDs (Cell IDs) through in-line network measurement when the user enjoys mobile services, such as video streaming service. In accuracy evaluations, we conduct two different field experiments to collect the data in five typical transportation modes (static, walking, riding a bicycle, a bus and a train,) and then construct the classifiers by applying Support Vector Machine (SVM), k-Nearest Neighbor (k-NN) and Random Forest (RF). Results conclude that these transportation modes can be recognized by using communication quality factors with high accuracy as well as the use of accelerometer sensors.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130325255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stackelberg Game Based Rate Allocation for HEVC Region of Interest Coding","authors":"Zizheng Liu, Xiang Pan, Yiming Li, Zhenzhong Chen","doi":"10.1109/ICME.2018.8486526","DOIUrl":"https://doi.org/10.1109/ICME.2018.8486526","url":null,"abstract":"Region of Interests (ROI) coding has shown advantages in subjectively optimized video coding. In this paper, we propose a new CTU-Ievel rate allocation scheme based on the Stackelberg Game model to enhance the visual quality of ROI which formulates the rate allocation process as a noncooperative Stackelberg Game between the ROI and the non-ROI. In this game, ROI is the leader who takes the priority. Based on the formulated game, the rate allocation problem can be expressed as a utility optimization problem. By solving the corresponding utility optimization problem, a novel CTU-level rate allocation strategy is established, in which a trade-off between the ROI's quality and the overall quality can be achieved. The experimental results show that our proposed scheme can improve the quality of ROI significantly with a negligible overall quality degradation.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}