2017 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Deep hybrid residual learning with statistic priors for single image super-resolution 基于统计先验的单图像超分辨率深度混合残差学习
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019468
Risheng Liu, Xiangyu Wang, Xin Fan, Haojie Li, Zhongxuan Luo
{"title":"Deep hybrid residual learning with statistic priors for single image super-resolution","authors":"Risheng Liu, Xiangyu Wang, Xin Fan, Haojie Li, Zhongxuan Luo","doi":"10.1109/ICME.2017.8019468","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019468","url":null,"abstract":"This paper considers single image super-resolution (SISR), which is an important low-level vision task and has various applications in multimedia society. Recently, deep neural networks have archived good performance on this field. But most of existing deep models are based on the fully data-dependent network architecture, thus missing majority of domain-knowledge of the super-resolution task. To address this limitation, we develop a new hybrid residual learning approach to leverage priors of SISR within the maximum a posteriori framework for network architecture design. We demonstrate that it can incorporate both image priors and data fidelity into the network, leading to a novel cascaded residual learning system for SISR process. Extensive experimental results on real-world images show that the proposed algorithm performs favorably against state-of-the-art methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Steganographer detection via deep residual network 基于深度残差网络的隐写检测
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019320
Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang
{"title":"Steganographer detection via deep residual network","authors":"Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang","doi":"10.1109/ICME.2017.8019320","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019320","url":null,"abstract":"Steganographer detection problem is to identify culprit actors, who try to hide confidential information with steganography, among many innocent actors. This task has significant challenges, including various embedding steganographic algorithms and payloads, which are usually avoided in steganalysis under laboratory conditions. In this paper, we propose a novel steganographer detection model based on deep residual network. The proposed method strengthens the signal coming from secret messages, which is beneficial for the discrimination between guilty actors and innocent actors. Comprehensive experiments demonstrate that the proposed model achieves very low detection error rates in steganographer detection task. It also outperforms the classical rich model method and other CNN based method. Moreover, the model shows the robustness of inter-steganographic algorithms and inter-payloads.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116568277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deep learning for robust outdoor vehicle visual tracking 基于深度学习的鲁棒户外车辆视觉跟踪
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019329
J. Xin, Xing Du, Jian Zhang
{"title":"Deep learning for robust outdoor vehicle visual tracking","authors":"J. Xin, Xing Du, Jian Zhang","doi":"10.1109/ICME.2017.8019329","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019329","url":null,"abstract":"Robust visual tracking for outdoor vehicle is still a challenging problem due to large appearance variations caused by illumination variation, occlusion and scale variation, etc. In this paper, a deep-learning-based approach for robust outdoor vehicle tracking is proposed. Firstly, a stacked denoising auto-encoder is pre-trained to learn the feature representation way of images. Then, a k-sparse constraint is added to the stacked denoising auto-encoder and the encoder of k-sparse stacked denoising auto-encoder (kSSDAE) is connected with a classification layer to construct a classification neural network. After fine-tuning, the classification neural network is applied to online tracking under particle filter framework. Extensive tracking experiments are conducted on a challenging single object online tracking evaluation platform benchmark to verify the effectiveness of our tracker. Experiments show that our tracker outperforms most state-of-the-art trackers.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114277425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An end-to-end recognizer for in-air handwritten Chinese characters based on a new recurrent neural networks 基于递归神经网络的空中手写汉字端到端识别
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019443
Haiqing Ren, Weiqiang Wang, K. Lu, Jianshe Zhou, Qiuchen Yuan
{"title":"An end-to-end recognizer for in-air handwritten Chinese characters based on a new recurrent neural networks","authors":"Haiqing Ren, Weiqiang Wang, K. Lu, Jianshe Zhou, Qiuchen Yuan","doi":"10.1109/ICME.2017.8019443","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019443","url":null,"abstract":"In-air handwriting is becoming a new human-computer interaction way. It is a challenging task to accurately recognizing in-air handwritten Chinese characters. In this paper, we present an end-to-end recognizer for in-air handwritten Chinese characters by using recurrent neural networks (RNN). Compared with the existing methods, the proposed RNN based methods does not need to explicitly extract features and directly take a sequence of dot locations as input. We have made two aspects of modifications on traditional RNN for improving the recognition accuracy. Concretely, the sum-pooling is performed on the states of each hidden layers, and a faster convergence in training can be obtained. Additionally, an assistant objective function is introduced into the conventional loss function, which brings a slight increase of performance. To evaluate the performance of the proposed method, the experiments are carried out on the IAHCC-UCAS2016 datasets to compare ours with other state-of-art methods. The experimental results show that the proposed RNN model has a fairly high recognition accuracy for in-air handwritten Chinese characters.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN VIDEOWHISPER:利用RNN实现视频判别特征的无监督学习
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019344
Na Zhao, Hanwang Zhang, Mingxing Zhang, Richang Hong, Meng Wang, Tat-Seng Chua
{"title":"VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN","authors":"Na Zhao, Hanwang Zhang, Mingxing Zhang, Richang Hong, Meng Wang, Tat-Seng Chua","doi":"10.1109/ICME.2017.8019344","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019344","url":null,"abstract":"We present VidedWhisfer, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics, dubbed “whisper”, which encodes both visual contents at the frame level and visual dynamics at the sequence level. VidedWhisfer is driven by a novel “sequence-to-whisper” learning strategy. Naturally, an end-to-end sequence-to-sequence learning model using RNN is modeled and trained to predict the whisper sequence. We propose two ways to generate video representation from the model. Through extensive experiments we demonstrate that video representation learned by VidedWhisfer is effective to boost fundamental video-related applications such as video retrieval and classification.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Visual speech synthesis from 3D mesh sequences driven by combined speech features 由组合语音特征驱动的三维网格序列视觉语音合成
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019546
Felix Kuhnke, J. Ostermann
{"title":"Visual speech synthesis from 3D mesh sequences driven by combined speech features","authors":"Felix Kuhnke, J. Ostermann","doi":"10.1109/ICME.2017.8019546","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019546","url":null,"abstract":"Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122520838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Spontaneous thermal facial expression analysis based on trajectory-pooled fisher vector descriptor 基于轨迹池fisher向量描述子的自发热面部表情分析
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019315
Peng Liu, L. Yin
{"title":"Spontaneous thermal facial expression analysis based on trajectory-pooled fisher vector descriptor","authors":"Peng Liu, L. Yin","doi":"10.1109/ICME.2017.8019315","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019315","url":null,"abstract":"We present a new descriptor for spontaneous facial expression recognition from videos acquired by a thermal sensor. Previous descriptors mostly compute features from RGB videos. It is difficult to process mixed and varied spontaneous expressions with a large ambiguity of facial appearances. In contrast, thermal imaging can measure autonomic activities, which are the physiological changes evoked by the autonomic nervous system regardless of the variety and ambiguity of facial appearances. This paper presents a new thermal video representation as so-called trajectory-pooled fisher vector descriptor (TFD). To get the local energy and temperature changes, we propose to use spatio-temporal orientation energy and acceleration of dense trajectory as low level features and further improve the discriminative capacity by aggregating the local feature using an improved fisher vector. The benefits of TFD in comparison with existing approaches are illustrated in two databases using different modalities: USTC-NVIE database and MMSE (a.k.a. BP4D+) database.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131112417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Knowledge-guided recurrent neural network learning for task-oriented action prediction 面向任务的递归神经网络学习
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019345
Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng
{"title":"Knowledge-guided recurrent neural network learning for task-oriented action prediction","authors":"Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng","doi":"10.1109/ICME.2017.8019345","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019345","url":null,"abstract":"This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent longshort term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Webpage cross-browser test from image level 网页跨浏览器测试从图像级别
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019400
P. Lu, Wei-liang Fan, Jun Sun, H. Tanaka, S. Naoi
{"title":"Webpage cross-browser test from image level","authors":"P. Lu, Wei-liang Fan, Jun Sun, H. Tanaka, S. Naoi","doi":"10.1109/ICME.2017.8019400","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019400","url":null,"abstract":"Incompatibility of webpages under different browsers and platforms is a typical technical obstruction for webpage design. To address this issue, a key challenge is to automatically detect the incompatible components and quantitatively assess the distortion extent in cross-browser tests. This paper presents a new algorithm for image pair comparison from webpages, called iterative perceptual hash (IPH), as well as a new distortion evaluation index called structure-color-saliency (SCS). The IPH that operates in an iterative manner is proposed to detect content changes considering both global structure and local content difference. The SCS assesses the distortion extent in both dimensions of image structure and color and is capable of imitating the nonlinear human perception. Experiment results demonstrate the effectiveness of IPH (e.g., F1-score 96%) and the high consistency of SCS with subjective results.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A unified model for improving depth accuracy in kinect sensor 提高kinect传感器深度精度的统一模型
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019370
Li Peng, Yanduo Zhang, Huabing Zhou, Deng Chen, Zhenghong Yu, Junjun Jiang, Jiayi Ma
{"title":"A unified model for improving depth accuracy in kinect sensor","authors":"Li Peng, Yanduo Zhang, Huabing Zhou, Deng Chen, Zhenghong Yu, Junjun Jiang, Jiayi Ma","doi":"10.1109/ICME.2017.8019370","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019370","url":null,"abstract":"The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low depth accuracy. In this paper, we present a unified depth modification model to improve the Kinect depth accuracy by registering depth and color images in an iterative manner. Specifically, in each iteration, we first establish a coarse correspondence based on the feature descriptor of the canny edge. Then, we estimate the fine correspondence using a robust estimator called the L2E with the nonparametric model. Finally, we correct the depth data according to the correspondence results. In order to evaluate the effectiveness of our approach, we have performed extensive experiments and then analyzed the experimental results from the following respects: the accuracy of depth data, the accuracy of correspondence between color and depth images as well as the measurement error in the 3D reconstruction by our method. The experimental results show that our approach greatly improves the depth accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128099111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信