Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang
{"title":"Steganographer detection via deep residual network","authors":"Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang","doi":"10.1109/ICME.2017.8019320","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019320","url":null,"abstract":"Steganographer detection problem is to identify culprit actors, who try to hide confidential information with steganography, among many innocent actors. This task has significant challenges, including various embedding steganographic algorithms and payloads, which are usually avoided in steganalysis under laboratory conditions. In this paper, we propose a novel steganographer detection model based on deep residual network. The proposed method strengthens the signal coming from secret messages, which is beneficial for the discrimination between guilty actors and innocent actors. Comprehensive experiments demonstrate that the proposed model achieves very low detection error rates in steganographer detection task. It also outperforms the classical rich model method and other CNN based method. Moreover, the model shows the robustness of inter-steganographic algorithms and inter-payloads.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116568277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning for robust outdoor vehicle visual tracking","authors":"J. Xin, Xing Du, Jian Zhang","doi":"10.1109/ICME.2017.8019329","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019329","url":null,"abstract":"Robust visual tracking for outdoor vehicle is still a challenging problem due to large appearance variations caused by illumination variation, occlusion and scale variation, etc. In this paper, a deep-learning-based approach for robust outdoor vehicle tracking is proposed. Firstly, a stacked denoising auto-encoder is pre-trained to learn the feature representation way of images. Then, a k-sparse constraint is added to the stacked denoising auto-encoder and the encoder of k-sparse stacked denoising auto-encoder (kSSDAE) is connected with a classification layer to construct a classification neural network. After fine-tuning, the classification neural network is applied to online tracking under particle filter framework. Extensive tracking experiments are conducted on a challenging single object online tracking evaluation platform benchmark to verify the effectiveness of our tracker. Experiments show that our tracker outperforms most state-of-the-art trackers.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114277425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haiqing Ren, Weiqiang Wang, K. Lu, Jianshe Zhou, Qiuchen Yuan
{"title":"An end-to-end recognizer for in-air handwritten Chinese characters based on a new recurrent neural networks","authors":"Haiqing Ren, Weiqiang Wang, K. Lu, Jianshe Zhou, Qiuchen Yuan","doi":"10.1109/ICME.2017.8019443","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019443","url":null,"abstract":"In-air handwriting is becoming a new human-computer interaction way. It is a challenging task to accurately recognizing in-air handwritten Chinese characters. In this paper, we present an end-to-end recognizer for in-air handwritten Chinese characters by using recurrent neural networks (RNN). Compared with the existing methods, the proposed RNN based methods does not need to explicitly extract features and directly take a sequence of dot locations as input. We have made two aspects of modifications on traditional RNN for improving the recognition accuracy. Concretely, the sum-pooling is performed on the states of each hidden layers, and a faster convergence in training can be obtained. Additionally, an assistant objective function is introduced into the conventional loss function, which brings a slight increase of performance. To evaluate the performance of the proposed method, the experiments are carried out on the IAHCC-UCAS2016 datasets to compare ours with other state-of-art methods. The experimental results show that the proposed RNN model has a fairly high recognition accuracy for in-air handwritten Chinese characters.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN","authors":"Na Zhao, Hanwang Zhang, Mingxing Zhang, Richang Hong, Meng Wang, Tat-Seng Chua","doi":"10.1109/ICME.2017.8019344","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019344","url":null,"abstract":"We present VidedWhisfer, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics, dubbed “whisper”, which encodes both visual contents at the frame level and visual dynamics at the sequence level. VidedWhisfer is driven by a novel “sequence-to-whisper” learning strategy. Naturally, an end-to-end sequence-to-sequence learning model using RNN is modeled and trained to predict the whisper sequence. We propose two ways to generate video representation from the model. Through extensive experiments we demonstrate that video representation learned by VidedWhisfer is effective to boost fundamental video-related applications such as video retrieval and classification.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual speech synthesis from 3D mesh sequences driven by combined speech features","authors":"Felix Kuhnke, J. Ostermann","doi":"10.1109/ICME.2017.8019546","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019546","url":null,"abstract":"Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122520838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spontaneous thermal facial expression analysis based on trajectory-pooled fisher vector descriptor","authors":"Peng Liu, L. Yin","doi":"10.1109/ICME.2017.8019315","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019315","url":null,"abstract":"We present a new descriptor for spontaneous facial expression recognition from videos acquired by a thermal sensor. Previous descriptors mostly compute features from RGB videos. It is difficult to process mixed and varied spontaneous expressions with a large ambiguity of facial appearances. In contrast, thermal imaging can measure autonomic activities, which are the physiological changes evoked by the autonomic nervous system regardless of the variety and ambiguity of facial appearances. This paper presents a new thermal video representation as so-called trajectory-pooled fisher vector descriptor (TFD). To get the local energy and temperature changes, we propose to use spatio-temporal orientation energy and acceleration of dense trajectory as low level features and further improve the discriminative capacity by aggregating the local feature using an improved fisher vector. The benefits of TFD in comparison with existing approaches are illustrated in two databases using different modalities: USTC-NVIE database and MMSE (a.k.a. BP4D+) database.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131112417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-guided recurrent neural network learning for task-oriented action prediction","authors":"Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng","doi":"10.1109/ICME.2017.8019345","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019345","url":null,"abstract":"This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent longshort term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Webpage cross-browser test from image level","authors":"P. Lu, Wei-liang Fan, Jun Sun, H. Tanaka, S. Naoi","doi":"10.1109/ICME.2017.8019400","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019400","url":null,"abstract":"Incompatibility of webpages under different browsers and platforms is a typical technical obstruction for webpage design. To address this issue, a key challenge is to automatically detect the incompatible components and quantitatively assess the distortion extent in cross-browser tests. This paper presents a new algorithm for image pair comparison from webpages, called iterative perceptual hash (IPH), as well as a new distortion evaluation index called structure-color-saliency (SCS). The IPH that operates in an iterative manner is proposed to detect content changes considering both global structure and local content difference. The SCS assesses the distortion extent in both dimensions of image structure and color and is capable of imitating the nonlinear human perception. Experiment results demonstrate the effectiveness of IPH (e.g., F1-score 96%) and the high consistency of SCS with subjective results.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Peng, Yanduo Zhang, Huabing Zhou, Deng Chen, Zhenghong Yu, Junjun Jiang, Jiayi Ma
{"title":"A unified model for improving depth accuracy in kinect sensor","authors":"Li Peng, Yanduo Zhang, Huabing Zhou, Deng Chen, Zhenghong Yu, Junjun Jiang, Jiayi Ma","doi":"10.1109/ICME.2017.8019370","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019370","url":null,"abstract":"The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low depth accuracy. In this paper, we present a unified depth modification model to improve the Kinect depth accuracy by registering depth and color images in an iterative manner. Specifically, in each iteration, we first establish a coarse correspondence based on the feature descriptor of the canny edge. Then, we estimate the fine correspondence using a robust estimator called the L2E with the nonparametric model. Finally, we correct the depth data according to the correspondence results. In order to evaluate the effectiveness of our approach, we have performed extensive experiments and then analyzed the experimental results from the following respects: the accuracy of depth data, the accuracy of correspondence between color and depth images as well as the measurement error in the 3D reconstruction by our method. The experimental results show that our approach greatly improves the depth accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128099111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Yuan, Z. Chen, Jiwen Lu, Jianjiang Feng, Jie Zhou
{"title":"Reconstruction-based supervised hashing","authors":"Xin Yuan, Z. Chen, Jiwen Lu, Jianjiang Feng, Jie Zhou","doi":"10.1109/ICME.2017.8019353","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019353","url":null,"abstract":"In this paper, we propose a reconstruction-based supervised hashing (RSH) method to learn compact binary codes with holistic structure preservation for large scale image search. Unlike most existing hashing methods which consider pair-wise similarity, our method exploits the structural information of samples by employing a reconstruction-based criterion. Moreover, the label information of samples is also utilized to enhance the discriminative power of the teamed hash codes. Specifically, our method minimizes the distance between each point and the selected generated-structure with the same class label and maximizes the distance between each point and the selected generated-structure with different class labels. Experimental results on two widely used image datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132947559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}