Proceedings of the 24th ACM international conference on Multimedia最新文献

筛选
英文 中文
Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection 基于标签相关约束的深度多任务学习视频概念检测
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967271
Fotini Markatopoulou, V. Mezaris, I. Patras
{"title":"Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection","authors":"Fotini Markatopoulou, V. Mezaris, I. Patras","doi":"10.1145/2964284.2967271","DOIUrl":"https://doi.org/10.1145/2964284.2967271","url":null,"abstract":"In this work we propose a method that integrates multi-task learning (MTL) and deep learning. Our method appends a MTL-like loss to a deep convolutional neural network, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. We apply the proposed method on a transfer learning scenario, where our objective is to fine-tune the parameters of a network that has been originally trained on a large-scale image dataset for concept detection, so that it be applied on a target video dataset and a corresponding new set of target concepts. We evaluate the proposed method for the video concept detection problem on the TRECVID 2013 Semantic Indexing dataset. Our results show that the proposed algorithm leads to better concept-based video annotation than existing state-of-the-art methods.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A Digital World to Thrive In: How the Internet of Things Can Make the "Invisible Hand" Work 一个蓬勃发展的数字世界:物联网如何使“看不见的手”发挥作用
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2984749
D. Helbing
{"title":"A Digital World to Thrive In: How the Internet of Things Can Make the \"Invisible Hand\" Work","authors":"D. Helbing","doi":"10.1145/2964284.2984749","DOIUrl":"https://doi.org/10.1145/2964284.2984749","url":null,"abstract":"Managing data-rich societies wisely and reaching sustainable development are among the greatest challenges of the 21st century. We are faced with existential threats and huge opportunities. If we don't act now, large parts of our society will not be able to economically benefit from the digital revolution. This could lead to mass unemployment and social unrest. It is time to create the right framework for the digital society to come.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123853433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Morph: A Fast and Scalable Cloud Transcoding System Morph:一个快速和可扩展的云转码系统
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973792
Guanyu Gao, Yonggang Wen
{"title":"Morph: A Fast and Scalable Cloud Transcoding System","authors":"Guanyu Gao, Yonggang Wen","doi":"10.1145/2964284.2973792","DOIUrl":"https://doi.org/10.1145/2964284.2973792","url":null,"abstract":"Morph is an open source cloud transcoding system. It can leverage the scalability of the cloud infrastructure to encode and transcode video contents in fast speed, and dynamically provision the resources in cloud to accommodate the workload. The system is composed of a master node that performs the video file segmentation, concentration, and task scheduling operations; and multiple worker nodes that perform the transcoding for video blocks. Morph can transcode the video blocks of a video file on multiple workers in parallel to achieve fast speed, and automatically manage the data transfers and communications between the master node and the worker nodes. The worker nodes can join into or leave the transcoding cluster at any time for dynamic resource provisioning. The system is very modular, and all of the algorithms can be easily modified or replaced. We release the source code of Morph under MIT License, hoping that it can be shared among various research communities.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125189249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks 基于多任务深度神经网络的面部表情改善自适应视觉反馈生成
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967236
Takuhiro Kaneko, Kaoru Hiramatsu, K. Kashino
{"title":"Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks","authors":"Takuhiro Kaneko, Kaoru Hiramatsu, K. Kashino","doi":"10.1145/2964284.2967236","DOIUrl":"https://doi.org/10.1145/2964284.2967236","url":null,"abstract":"While many studies in computer vision and pattern recognition have been actively conducted to recognize people's current states, few studies have tackled the problem of generating feedback on how people can improve their states, although there are many real-world applications such as in sports, education, and health care. In particular, it has been challenging to develop such a system that can adaptively generate feedback for real-world situations, namely various input and target states, since it requires formulating various rules of feedback to do so. We propose a learning-based method to solve this problem. If we can obtain a large amount of feedback annotations, it is possible to explicitly learn the rules, but it is difficult to do so due to the subjective nature of the task. To mitigate this problem, our method implicitly learns the rules from training data consisting of input images, key-point annotations, and state annotations that do not require professional knowledge in feedback. Given such training data, we first learn a multi-task deep neural network with state recognition and key-point localization. Then, we apply a novel propagation method for extracting feedback information from the network. We evaluated our method in a facial expression improvement task using real-world data and clarified its characteristics and effectiveness.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Deep Representation for Abnormal Event Detection in Crowded Scenes 基于深度表示的拥挤场景异常事件检测
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967290
Y. Feng, Yuan Yuan, Xiaoqiang Lu
{"title":"Deep Representation for Abnormal Event Detection in Crowded Scenes","authors":"Y. Feng, Yuan Yuan, Xiaoqiang Lu","doi":"10.1145/2964284.2967290","DOIUrl":"https://doi.org/10.1145/2964284.2967290","url":null,"abstract":"Abnormal event detection is extremely important, especially for video surveillance. Nowadays, many detectors have been proposed based on hand-crafted features. However, it remains challenging to effectively distinguish abnormal events from normal ones. This paper proposes a deep representation based algorithm which extracts features in an unsupervised fashion. Specially, appearance, texture, and short-term motion features are automatically learned and fused with stacked denoising autoencoders. Subsequently, long-term temporal clues are modeled with a long short-term memory (LSTM) recurrent network, in order to discover meaningful regularities of video events. The abnormal events are identified as samples which disobey these regularities. Moreover, this paper proposes a spatial anomaly detection strategy via manifold ranking, aiming at excluding false alarms. Experiments and comparisons on real world datasets show that the proposed algorithm outperforms state of the arts for the abnormal event detection problem in crowded scenes.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Learning Music Emotion Primitives via Supervised Dynamic Clustering 基于监督动态聚类的音乐情感基元学习
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967215
Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang
{"title":"Learning Music Emotion Primitives via Supervised Dynamic Clustering","authors":"Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang","doi":"10.1145/2964284.2967215","DOIUrl":"https://doi.org/10.1145/2964284.2967215","url":null,"abstract":"This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixed-length music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128293471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multimodal Gamified Platform for Real-Time User Feedback in Sports Performance 一个多模式的游戏化平台,用于实时用户反馈运动表现
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973815
David S. Monaghan, Freddie Honohan, A. Ahmadi, T. McDaniel, Ramin Tadayon, Ajay Karpur, Kieran Moran, N. O’Connor, S. Panchanathan
{"title":"A Multimodal Gamified Platform for Real-Time User Feedback in Sports Performance","authors":"David S. Monaghan, Freddie Honohan, A. Ahmadi, T. McDaniel, Ramin Tadayon, Ajay Karpur, Kieran Moran, N. O’Connor, S. Panchanathan","doi":"10.1145/2964284.2973815","DOIUrl":"https://doi.org/10.1145/2964284.2973815","url":null,"abstract":"In this paper we introduce a novel platform that utilises multi-modal low-cost motion capture technology for the delivery of real-time visual feedback for sports performance. This platform supports the expansion to multi-modal interfaces that utilise haptic and audio feedback, which scales effectively with motor task complexity. We demonstrate an implementation of our platform within the field of sports performance. The platform includes low-cost motion capture through a fusion technique, combining a Microsoft Kinect V2 with two wrist inertial sensors, which make use of the accelerometer and gyroscope sensors, alongside a game-based Graphical User Interface (GUI) for instruction, visual feedback and gamified score tracking.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128534393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Domain Robust Approach For Image Dataset Construction 一种图像数据集构建的领域鲁棒方法
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967213
Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang
{"title":"A Domain Robust Approach For Image Dataset Construction","authors":"Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang","doi":"10.1145/2964284.2967213","DOIUrl":"https://doi.org/10.1145/2964284.2967213","url":null,"abstract":"There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the \"dataset bias problem\". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a \"bag\" and the retrieved images therein as \"instances\", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130493415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Cross-batch Reference Learning for Deep Classification and Retrieval 深度分类与检索的跨批参考学习
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964324
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
{"title":"Cross-batch Reference Learning for Deep Classification and Retrieval","authors":"Huei-Fang Yang, Kevin Lin, Chu-Song Chen","doi":"10.1145/2964284.2964324","DOIUrl":"https://doi.org/10.1145/2964284.2964324","url":null,"abstract":"Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128671067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge AVEC 2016总结:抑郁,情绪和情绪识别研讨会和挑战
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2980532
M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, R. Cowie, M. Pantic
{"title":"Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge","authors":"M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, R. Cowie, M. Pantic","doi":"10.1145/2964284.2980532","DOIUrl":"https://doi.org/10.1145/2964284.2980532","url":null,"abstract":"The sixth Audio-Visual Emotion Challenge and workshop AVEC 2016 was held in conjunction ACM Multimedia'16. This year the AVEC series addresses two distinct sub-challenges, multi-modal emotion recognition and audio-visual depression detection. Both sub-challenges are in a way a return to AVEC's past editions: the emotion sub-challenge is based on the same dataset as the one used in AVEC 2015, and depression analysis was previously addressed in AVEC 2013/2014. In this summary, we mainly describe participation and its conditions.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126950581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信