Xin Yuan, Z. Chen, Jiwen Lu, Jianjiang Feng, Jie Zhou
{"title":"Reconstruction-based supervised hashing","authors":"Xin Yuan, Z. Chen, Jiwen Lu, Jianjiang Feng, Jie Zhou","doi":"10.1109/ICME.2017.8019353","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019353","url":null,"abstract":"In this paper, we propose a reconstruction-based supervised hashing (RSH) method to learn compact binary codes with holistic structure preservation for large scale image search. Unlike most existing hashing methods which consider pair-wise similarity, our method exploits the structural information of samples by employing a reconstruction-based criterion. Moreover, the label information of samples is also utilized to enhance the discriminative power of the teamed hash codes. Specifically, our method minimizes the distance between each point and the selected generated-structure with the same class label and maximizes the distance between each point and the selected generated-structure with different class labels. Experimental results on two widely used image datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132947559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D action recognition using data visualization and convolutional neural networks","authors":"Mengyuan Liu, Chen Chen, Hong Liu","doi":"10.1109/ICME.2017.8019438","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019438","url":null,"abstract":"It remains a challenge to efficiently represent spatial-temporal data for 3D action recognition. To solve this problem, this paper presents a new skeleton-based action representation using data visualization and convolutional neural networks, which contains four main stages. First, skeletons from an action sequence are mapped as a set of five dimensional points, containing three dimensions of location, one dimension of time label and one dimension of joint label. Second, these points are encoded as a series of color images, by visualizing points as RGB pixels. Third, convolutional neural networks are adopted to extract deep features from color images. Finally, action class score is calculated by fusing selected deep features. Extensive experiments on three benchmark datasets show that our method achieves state-of-the-art results.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131880242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic-aware adaptation scheme for soccer video over MPEG-DASH","authors":"Shenghong Hu, Lingfen Sun, Chunxia Xiao, Chao Gui","doi":"10.1109/ICME.2017.8019541","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019541","url":null,"abstract":"In recent years, quality of experience (QoE) has been investigated and proved to have both influential factors on user's visual quality and perceptual quality, while the perceptual quality means user's requirement on personalized content should be acquired in optimized quality. That's to say, those segments holding user interested content such as highlights need to be allocated more network resource in a resource-limited streaming scenario. However, all the existing HTTP-based adaptive methods only focus the content-agnostic bitrate adaptation according to limited network resources or energy resource, since they ignored user perceived semantics on some important segments, which suffered less quality on the important segments than on those ordinary ones, so as to hurt the overall QoE. In this paper, we have proposed a new semantic-aware adaptation scheme for MPEG-DASH services, which decides how to preserve bandwidth and buffering time depending on content descriptors for the perceived important content to users. Further, a semantic-aware probe and adaptation (SMA-PANDA) algorithm has been implemented in a DASH client to compare with conventional bitrate adaptions. Preliminary results show that SMA-PANDA achieves better QoE and flexibility on streaming user's interested content on MPEG-DASH platform, and it also aggressively helps user interested content compete more resource to deliver high quality presentation.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130028711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ensemble metric learning scheme for face recognition","authors":"Anirud Thyagharajan, A. Routray","doi":"10.1109/ICME.2017.8019473","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019473","url":null,"abstract":"The metric learning problem is concerned with learning a distance function tuned to a particular task, and has been shown to be useful when used in conjunction with nearest-neighbor methods and other techniques that rely on distances or similarities. This paper proposes an ensemble learning technique which combines the efforts of multiple metric learning algorithms like Large Margin Nearest Neighbours (LMNN), Local Fisher Discriminant Analysis (LFDA), Logistic Discriminant Metric Learning (LDML) and a few others to solve the problem of face recognition. In the ensemble learning technique, we propose and study 4 kinds of weighting schemes, namely (1) hard voting, (2) equally weighted soft voting, (3) adaptive soft weighting, and (4) decision tree/neural network based soft voting. In this paper, we present our results compared to Support Vector Machines (SVMs). Experiments show that our proposed method attains state-of-the-art results on the challenging Labeled Faces in the Wild (LFW) dataset [1].","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124402977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Gao, Fumin Shen, Yang Yang, Xing Xu, Hanxi Li, Heng Tao Shen
{"title":"Asymmetric sparse hashing","authors":"Xin Gao, Fumin Shen, Yang Yang, Xing Xu, Hanxi Li, Heng Tao Shen","doi":"10.1109/ICME.2017.8019306","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019306","url":null,"abstract":"Learning based hashing has become increasingly popular because of its high efficiency in handling the large scale image retrieval. Preserving the pairwise similarities of data points in the Hamming space is critical in state-of-the-art hashing techniques. However, most previous methods ignore to capture the local geometric structure residing on original data, which is essential for similarity search. In this paper, we propose a novel hashing framework, which simultaneously optimizes similarity preserving hash codes and reconstructs the locally linear structures of data in the Hamming space. In specific, we learn two hash functions such that the resulting two sets of binary codes can well preserve the pairwise similarity and sparse neighborhood in the original feature space. By taking advantage of the flexibility of asymmetric hash functions, we devise an efficient alternating algorithm to optimize the hash coding function and high-quality binary codes jointly. We evaluate the proposed method on several large-scale image datasets, and the results demonstrate it significantly outperforms recent state-of-the-art hashing methods on large-scale image retrieval problems.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134357861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Wu, Viswanathan Swaminathan, Saayan Mitra, Ratnesh Kumar
{"title":"Context-aware video recommendation based on session progress prediction","authors":"Gang Wu, Viswanathan Swaminathan, Saayan Mitra, Ratnesh Kumar","doi":"10.1109/ICME.2017.8019458","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019458","url":null,"abstract":"In the analysis of digital content consumption, session progress provides a good alternative to using manual ratings for measuring user engagement. A good prediction of session progress is useful for optimizing and personalizing the end-user experience. Most prevalent methods of predicting session progress are based on matrix completion and only consider the interaction among users and videos, while the associated contextual information is usually not used. In this paper, we present our approach for video recommendation, based on session progress prediction and incorporating the context. We test our approach on real-world session progress data, and observe considerable improvement in prediction accuracy achieved by incorporating selected context. Our experiments also show that proper context selection and the number of observed sessions for users are two key factors affecting the prediction accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianjun Xia, R. Togneri, Ferdous Sohel, David Huang
{"title":"Random forest classification based acoustic event detection","authors":"Xianjun Xia, R. Togneri, Ferdous Sohel, David Huang","doi":"10.1109/ICME.2017.8019452","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019452","url":null,"abstract":"This paper deals with the acoustic event detection (AED) to improve the detection accuracy of acoustic events. Acoustic event detection task is performed by a regression via classification (RvC) based approach along with the random forest technique. A discretization process is used to convert the continuous frame positions within acoustic events into event duration class labels. Outputs of the category-specific random forest classifiers are then reversed back to the event boundary information. Evaluations on the UPC-TALP database which consists of highly variable acoustic events demonstrate the efficiency of the proposed approaches with improvements in detection error rate compared to the best baseline system.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131975341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Hsuan Tsai, Yih-Cherng Lee, Jian-Jiun Ding, Ronald Y. Chang
{"title":"Flipping and blending based highly robust in-plane and out-of-plane color face detection","authors":"Yu-Hsuan Tsai, Yih-Cherng Lee, Jian-Jiun Ding, Ronald Y. Chang","doi":"10.1109/ICME.2017.8019295","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019295","url":null,"abstract":"Face detection is very important for video surveillance, human-computer interaction, and face recognition. In this paper, a very robust face detection algorithm that can well detect rotated, in-plane, and out-of-plane faces without large amount of training data is proposed. First, several techniques, including the entropy rate superpixel (ERS) and the skin filter, are applied to obtain face candidate regions. Then, angle compensation and non-maximum suppression are applied to improve the accuracy of face detection. Moreover, to find out-of-plane faces, one can apply the flipping-and-blending technique, i.e., blending the face candidate with its flipping version to create a face that is similar to the frontal one. With it, even if there are no training data for out-of-plane faces, one can successfully detect the faces in the out-of-plane case. Simulations on the FEI dataset and the BaoFace dataset show that the proposed algorithm is efficient and outperforms state-of-the-art face detection approaches.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133638744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evangelos Vlachos, A. Lalos, K. Moustakas, K. Berberidis
{"title":"Efficient graph-based matrix completion on incomplete animated models","authors":"Evangelos Vlachos, A. Lalos, K. Moustakas, K. Berberidis","doi":"10.1109/ICME.2017.8019502","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019502","url":null,"abstract":"Recently, there has been increasing interest for easy and reliable generation of 3D animated models facilitating several real-time applications. In most of these applications, the reconstruction of soft body animations is based on time-varying point clouds which are irregularly sampled and highly incomplete. To overcome these imperfections, we introduce a novel reconstruction technique, using graph-based matrix completion approaches. The presented method exploits spatio-temporal coherences by implicitly forcing the proximity of the adjacent 3D points in time and space. The proposed constraints are modeled by using the weighted Laplacian graphs and are constructed from the available points. Extensive evaluation studies, carried out using a collection of different highly-incomplete dynamic models, verify that the proposed technique achieves plausible reconstruction output despite the constraints posed by arbitrarily complex and motion scenarios.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"443 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133629336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-patch based face hallucination via thresholding locality-constrained representation and reproducing learning","authors":"Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Guo-Jun Qi, Akiko Aizawa","doi":"10.1109/ICME.2017.8019459","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019459","url":null,"abstract":"Face hallucination, which refers to predicting a HighResolution (HR) face image from an observed Low-Resolution (LR) one, is a challenging problem. Most state-of-the-arts employ local face structure prior to estimate the optimal representations for each patch by the training patches of the same position, and achieve good reconstruction performance. However, they do not take into account the contextual information of image patch, which is very useful for the expression of human face. Different from position-patch based methods, in this paper we leverage the contextual information and develop a robust and efficient context-patch face hallucination algorithm, called Thresholding Locality-constrained Representation with Reproducing learning (TLcR-RL). In TLcR-RL, we use a thresholding strategy to enhance the stability of patch representation and the reconstruction accuracy. Additionally, we develop a reproducing learning to iteratively enhance the estimated result by adding the estimated HR face to the training set. Experiments demonstrate that the performance of our proposed framework has a substantial increase when compared to state-of-the-arts, including recently proposed deep learning based method.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133577302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}