Proceedings of the 24th ACM international conference on Multimedia最新文献_第5页

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks 基于联合轨迹映射的卷积神经网络动作识别

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967191

Pichao Wang, Z. Li, Yonghong Hou, W. Li

引用次数: 315

Beauty eMakeup: A Deep Makeup Transfer System 美妆:深层彩妆转移系统

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973812

Xinyu Ou, Si Liu, Xiaochun Cao, H. Ling

引用次数: 14

Super Resolution of the Partial Pixelated Images With Deep Convolutional Neural Network 用深度卷积神经网络实现部分像素化图像的超分辨率

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967235

Haiyi Mao, Yue Wu, Jun Yu Li, Y. Fu

引用次数: 10

Attention-based LSTM with Semantic Consistency for Videos Captioning 视频字幕语义一致性的基于注意力的LSTM

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967242

Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, Heng Tao Shen

{"title":"Attention-based LSTM with Semantic Consistency for Videos Captioning","authors":"Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, Heng Tao Shen","doi":"10.1145/2964284.2967242","DOIUrl":"https://doi.org/10.1145/2964284.2967242","url":null,"abstract":"Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the exploration of their applications for automatically describing video content with natural language sentences. By taking a video as a sequence of features, LSTM model is trained on video-sentence pairs to learn association of a video to a sentence. However, most existing methods compress an entire video shot or frame into a static representation, without considering attention which allows for salient features. Furthermore, most existing approaches model the translating error, but ignore the correlations between sentence semantics and visual content. To tackle these issues, we propose a novel end-to-end framework named aLSTMs, an attention-based LSTM model with semantic consistency, to transfer videos to natural sentences. This framework integrates attention mechanism with LSTM to capture salient structures of video, and explores the correlation between multi-modal representations for generating sentences with rich semantic content. More specifically, we first propose an attention mechanism which uses the dynamic weighted sum of local 2D Convolutional Neural Network (CNN) and 3D CNN representations. Then, a LSTM decoder takes these visual features at time $t$ and the word-embedding feature at time $t$-$1$ to generate important words. Finally, we uses multi-modal embedding to map the visual and sentence features into a joint space to guarantee the semantic consistence of the sentence description and the video visual content. Experiments on the benchmark datasets demonstrate the superiority of our method than the state-of-the-art baselines for video captioning in both BLEU and METEOR.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks 从种子发现到深度重建:基于深度网络的群体显著性预测

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967185

Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao

{"title":"From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks","authors":"Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao","doi":"10.1145/2964284.2967185","DOIUrl":"https://doi.org/10.1145/2964284.2967185","url":null,"abstract":"Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently \"distinct\" and \"multi-modal\", which differs from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Transportation Mode Detection on Mobile Devices Using Recurrent Nets 基于循环网络的移动设备传输模式检测

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967249

Toan H. Vu, Le Dung, Jia-Ching Wang

引用次数: 35

Jockey Time: Making Video Playback to Enhance Emotional Effect 《Jockey Time》:通过录像回放来增强情感效果

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967183

Kyeong-Ah Jeong, Hyeon‐Jeong Suk

引用次数: 1

AltMM 2016: 1st International Workshop on Multimedia Alternate Realities 2016:第一届多媒体替代现实国际研讨会

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2980531

T. Chambel, Rene Kaiser, O. Niamut, Wei Tsang Ooi, J. Redi

引用次数: 3

Hypervideo Production Using Crowdsourced Youtube Videos 使用众包Youtube视频的超视频制作

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973810

Stefan John, Christian Handschigl, Britta Meixner, M. Granitzer

引用次数: 1

A Fast Cattle Recognition System using Smart devices 使用智能设备的快速牛识别系统

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973829

Santosh Kumar, S. Singh, Tanima Dutta, Hari Prabhat Gupta

{"title":"A Fast Cattle Recognition System using Smart devices","authors":"Santosh Kumar, S. Singh, Tanima Dutta, Hari Prabhat Gupta","doi":"10.1145/2964284.2973829","DOIUrl":"https://doi.org/10.1145/2964284.2973829","url":null,"abstract":"A recognition system is very useful to recognize human, object, and animals. An animal recognition system plays an important role in livestock biometrics, that helps in recognition and verification of livestock in case of missed or swapped animals, false insurance claims, and reallocation of animals at slaughter houses. In this research, we propose a fast and cost-effective animal biometrics based cattle recognition system to quickly recognize and verify the false insurance claims of cattle using their primary muzzle point image pattern characteristics. To solve this major problem, users (owner, parentage, or other) have captured the images of cattle using their smart devices. The captured images are transferred to the server of the cattle recognition system using a wireless network or internet technology. The system performs pre-processing on the muzzle point image of cattle to remove and filter the noise, increases the quality, and enhance the contrast. The muzzle point features are extracted and supervised machine learning based multi-classifier pattern recognition techniques are applied for recognizing the cattle. The server has a database of cattle images which are provided by the owners. Finally, One-Shot-Similarity (OSS) matching and distance metric learning based techniques with ensemble of classifiers technique are used for matching the query muzzle image with the stored database.A prototype is also developed for evaluating the efficacy of the proposed system in term of recognition accuracy and end-to-end delay.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"81 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124780306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21