Proceedings of the 24th ACM international conference on Multimedia最新文献

筛选
英文 中文
Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks 基于联合轨迹映射的卷积神经网络动作识别
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967191
Pichao Wang, Z. Li, Yonghong Hou, W. Li
{"title":"Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks","authors":"Pichao Wang, Z. Li, Yonghong Hou, W. Li","doi":"10.1145/2964284.2967191","DOIUrl":"https://doi.org/10.1145/2964284.2967191","url":null,"abstract":"Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132936671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 315
Beauty eMakeup: A Deep Makeup Transfer System 美妆:深层彩妆转移系统
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973812
Xinyu Ou, Si Liu, Xiaochun Cao, H. Ling
{"title":"Beauty eMakeup: A Deep Makeup Transfer System","authors":"Xinyu Ou, Si Liu, Xiaochun Cao, H. Ling","doi":"10.1145/2964284.2973812","DOIUrl":"https://doi.org/10.1145/2964284.2973812","url":null,"abstract":"In this demo, we present a Beauty eMakeup System to automatically recommend the most suitable makeup for a female and synthesis the makeup on her face. Given a before-makeup face, her most suitable makeup is determined automatically. Then, both the before-makeup and the reference faces are fed into the proposed Deep Transfer Network to generate the after-makeup face. Our end-to-end makeup transfer network have several nice properties including: (1) with complete functions: including foundation, lip gloss, and eye shadow transfer; (2) cosmetic specific: different cosmetics are transferred in different manners; (3) localized: different cosmetics are applied on different facial regions; (4) producing naturally looking results without obvious artifacts; (5) controllable makeup lightness: various results from light makeup to heavy makeup can be generated. Extensive experimental evaluations and analysis on testing images well demonstrate the effectiveness of the proposed system.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133761308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Super Resolution of the Partial Pixelated Images With Deep Convolutional Neural Network 用深度卷积神经网络实现部分像素化图像的超分辨率
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967235
Haiyi Mao, Yue Wu, Jun Yu Li, Y. Fu
{"title":"Super Resolution of the Partial Pixelated Images With Deep Convolutional Neural Network","authors":"Haiyi Mao, Yue Wu, Jun Yu Li, Y. Fu","doi":"10.1145/2964284.2967235","DOIUrl":"https://doi.org/10.1145/2964284.2967235","url":null,"abstract":"The problem of super resolution of partial pixelated images is considered in this paper. Partial pixelated images are more and more common in nowadays due to public safety etc. However, in some special cases, for instance criminal investigation, some images are pixelated intentionally by criminals and partial pixelate make it hard to reconstruct images even a higher resolution images. Hence, a method is proposed to handle this problem based on the deep convolutional neural network, termed depixelate super resolution CNN(DSRCNN). Given the mathematical expression pixelates, we propose a model to reconstruct the image from the pixelation and map to a higher resolution by combining the adversarial autoencoder with two depixelate layers. This model is evaluated on standard public datasets in which images are pixelated randomly and compared to the state of arts methods, shows very exciting performance.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131916196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Attention-based LSTM with Semantic Consistency for Videos Captioning 视频字幕语义一致性的基于注意力的LSTM
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967242
Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, Heng Tao Shen
{"title":"Attention-based LSTM with Semantic Consistency for Videos Captioning","authors":"Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, Heng Tao Shen","doi":"10.1145/2964284.2967242","DOIUrl":"https://doi.org/10.1145/2964284.2967242","url":null,"abstract":"Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the exploration of their applications for automatically describing video content with natural language sentences. By taking a video as a sequence of features, LSTM model is trained on video-sentence pairs to learn association of a video to a sentence. However, most existing methods compress an entire video shot or frame into a static representation, without considering attention which allows for salient features. Furthermore, most existing approaches model the translating error, but ignore the correlations between sentence semantics and visual content. To tackle these issues, we propose a novel end-to-end framework named aLSTMs, an attention-based LSTM model with semantic consistency, to transfer videos to natural sentences. This framework integrates attention mechanism with LSTM to capture salient structures of video, and explores the correlation between multi-modal representations for generating sentences with rich semantic content. More specifically, we first propose an attention mechanism which uses the dynamic weighted sum of local 2D Convolutional Neural Network (CNN) and 3D CNN representations. Then, a LSTM decoder takes these visual features at time $t$ and the word-embedding feature at time $t$-$1$ to generate important words. Finally, we uses multi-modal embedding to map the visual and sentence features into a joint space to guarantee the semantic consistence of the sentence description and the video visual content. Experiments on the benchmark datasets demonstrate the superiority of our method than the state-of-the-art baselines for video captioning in both BLEU and METEOR.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks 从种子发现到深度重建:基于深度网络的群体显著性预测
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967185
Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao
{"title":"From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks","authors":"Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao","doi":"10.1145/2964284.2967185","DOIUrl":"https://doi.org/10.1145/2964284.2967185","url":null,"abstract":"Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently \"distinct\" and \"multi-modal\", which differs from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transportation Mode Detection on Mobile Devices Using Recurrent Nets 基于循环网络的移动设备传输模式检测
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967249
Toan H. Vu, Le Dung, Jia-Ching Wang
{"title":"Transportation Mode Detection on Mobile Devices Using Recurrent Nets","authors":"Toan H. Vu, Le Dung, Jia-Ching Wang","doi":"10.1145/2964284.2967249","DOIUrl":"https://doi.org/10.1145/2964284.2967249","url":null,"abstract":"We present an approach to the use of Recurrent Neural Networks (RNN) for transportation mode detection (TMD) on mobile devices. The proposed model, called Control Gate-based Recurrent Neural Network (CGRNN), is an end-to-end model that works directly with raw signals from an embedded accelerometer. As mobile devices have limited computational resources, we evaluate the model in terms of accuracy, computational cost, and memory usage. Experiments on the HTC transportation mode dataset demonstrate that our proposed model not only exhibits remarkable accuracy, but also is efficient with low resource consumption.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117055456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Jockey Time: Making Video Playback to Enhance Emotional Effect 《Jockey Time》:通过录像回放来增强情感效果
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967183
Kyeong-Ah Jeong, Hyeon‐Jeong Suk
{"title":"Jockey Time: Making Video Playback to Enhance Emotional Effect","authors":"Kyeong-Ah Jeong, Hyeon‐Jeong Suk","doi":"10.1145/2964284.2967183","DOIUrl":"https://doi.org/10.1145/2964284.2967183","url":null,"abstract":"In order to effectively and easily deliver the affective quality of a video, this study investigated the emotional manifests induced by the playback design of the video. In designing the playback, we articulated speed, direction, and continuity of the video and surveyed observers' responses. Based on the results, we propose seven categories of playback design, and each appeals cheerful, happy, relaxed, funny, urgent, angry, and sad emotion. For an easy use, we offer an online video editing service, \"Jockey Time.\" A beta version was operated for a month for monitoring purpose, and finally, Jockey Time v.1.0 is now launched for anybody to easily enhance the emotional effect of one's video.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AltMM 2016: 1st International Workshop on Multimedia Alternate Realities 2016:第一届多媒体替代现实国际研讨会
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2980531
T. Chambel, Rene Kaiser, O. Niamut, Wei Tsang Ooi, J. Redi
{"title":"AltMM 2016: 1st International Workshop on Multimedia Alternate Realities","authors":"T. Chambel, Rene Kaiser, O. Niamut, Wei Tsang Ooi, J. Redi","doi":"10.1145/2964284.2980531","DOIUrl":"https://doi.org/10.1145/2964284.2980531","url":null,"abstract":"Multimedia experiences allow us to access other worlds, to live other people's stories, to communicate with or experience alternate realities. Different spaces, times or situations can be entered thanks to multimedia contents and systems, which coexist with our current reality, and are sometimes so vivid and engaging that we feel we are living in them. Advances in multimedia are making it possible to create immersive experiences that may involve the user in a different or augmented world, as an alternate reality. AltMM 2016, the 1st International Workshop on Multimedia Alternate Realities at ACM Multimedia, aims at exploring how the synergy between multimedia technologies and effects can foster the creation of alternate realities and make their access an enriching, valuable and real experience. The workshop program will contain a combination of oral and invited keynote presentations, and poster, demo and discussion sessions, altogether enabling interactive scientific sharing and discussion between practitioners and researchers.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"14 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123081583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hypervideo Production Using Crowdsourced Youtube Videos 使用众包Youtube视频的超视频制作
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973810
Stefan John, Christian Handschigl, Britta Meixner, M. Granitzer
{"title":"Hypervideo Production Using Crowdsourced Youtube Videos","authors":"Stefan John, Christian Handschigl, Britta Meixner, M. Granitzer","doi":"10.1145/2964284.2973810","DOIUrl":"https://doi.org/10.1145/2964284.2973810","url":null,"abstract":"Hypervideos, consisting of media enriched and linked video scenes, have proven useful in many scenarios. Software solutions exist that help authors make hypervideos from media files. However, recording and editing video scenes for hypervideos is a tedious and time consuming job. Huge video databases like YouTube exist that can provide rich sources of video material. Yet it is often illegal to download and re-purpose videos from these sites, requiring a solution that links whole videos or parts of videos and plays them in an embedded player. This work presents the SIVA Web Producer, a Chrome extension for the creation of hypervideos consisting of scenes from YouTube videos. After creating a project, the SIVA Web Producer embeds YouTube videos or parts thereof as video clips. These can then be linked in a scene graph and extended with annotations. The plug-in provides a preview space for testing the hypervideo. Finalized videos can be published on the SIVA Web Portal or embedded in a Web page.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124771768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Fast Cattle Recognition System using Smart devices 使用智能设备的快速牛识别系统
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973829
Santosh Kumar, S. Singh, Tanima Dutta, Hari Prabhat Gupta
{"title":"A Fast Cattle Recognition System using Smart devices","authors":"Santosh Kumar, S. Singh, Tanima Dutta, Hari Prabhat Gupta","doi":"10.1145/2964284.2973829","DOIUrl":"https://doi.org/10.1145/2964284.2973829","url":null,"abstract":"A recognition system is very useful to recognize human, object, and animals. An animal recognition system plays an important role in livestock biometrics, that helps in recognition and verification of livestock in case of missed or swapped animals, false insurance claims, and reallocation of animals at slaughter houses. In this research, we propose a fast and cost-effective animal biometrics based cattle recognition system to quickly recognize and verify the false insurance claims of cattle using their primary muzzle point image pattern characteristics. To solve this major problem, users (owner, parentage, or other) have captured the images of cattle using their smart devices. The captured images are transferred to the server of the cattle recognition system using a wireless network or internet technology. The system performs pre-processing on the muzzle point image of cattle to remove and filter the noise, increases the quality, and enhance the contrast. The muzzle point features are extracted and supervised machine learning based multi-classifier pattern recognition techniques are applied for recognizing the cattle. The server has a database of cattle images which are provided by the owners. Finally, One-Shot-Similarity (OSS) matching and distance metric learning based techniques with ensemble of classifiers technique are used for matching the query muzzle image with the stored database.A prototype is also developed for evaluating the efficacy of the proposed system in term of recognition accuracy and end-to-end delay.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"81 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124780306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信