Proceedings of the 24th ACM international conference on Multimedia最新文献_第6页

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection 基于标签相关约束的深度多任务学习视频概念检测

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967271

Fotini Markatopoulou, V. Mezaris, I. Patras

引用次数: 21

A Digital World to Thrive In: How the Internet of Things Can Make the "Invisible Hand" Work 一个蓬勃发展的数字世界:物联网如何使“看不见的手”发挥作用

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2984749

D. Helbing

引用次数: 1

Morph: A Fast and Scalable Cloud Transcoding System Morph:一个快速和可扩展的云转码系统

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973792

Guanyu Gao, Yonggang Wen

引用次数: 14

Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks 基于多任务深度神经网络的面部表情改善自适应视觉反馈生成

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967236

Takuhiro Kaneko, Kaoru Hiramatsu, K. Kashino

{"title":"Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks","authors":"Takuhiro Kaneko, Kaoru Hiramatsu, K. Kashino","doi":"10.1145/2964284.2967236","DOIUrl":"https://doi.org/10.1145/2964284.2967236","url":null,"abstract":"While many studies in computer vision and pattern recognition have been actively conducted to recognize people's current states, few studies have tackled the problem of generating feedback on how people can improve their states, although there are many real-world applications such as in sports, education, and health care. In particular, it has been challenging to develop such a system that can adaptively generate feedback for real-world situations, namely various input and target states, since it requires formulating various rules of feedback to do so. We propose a learning-based method to solve this problem. If we can obtain a large amount of feedback annotations, it is possible to explicitly learn the rules, but it is difficult to do so due to the subjective nature of the task. To mitigate this problem, our method implicitly learns the rules from training data consisting of input images, key-point annotations, and state annotations that do not require professional knowledge in feedback. Given such training data, we first learn a multi-task deep neural network with state recognition and key-point localization. Then, we apply a novel propagation method for extracting feedback information from the network. We evaluated our method in a facial expression improvement task using real-world data and clarified its characteristics and effectiveness.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Deep Representation for Abnormal Event Detection in Crowded Scenes 基于深度表示的拥挤场景异常事件检测

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967290

Y. Feng, Yuan Yuan, Xiaoqiang Lu

引用次数: 41

Learning Music Emotion Primitives via Supervised Dynamic Clustering 基于监督动态聚类的音乐情感基元学习

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967215

Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang

{"title":"Learning Music Emotion Primitives via Supervised Dynamic Clustering","authors":"Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang","doi":"10.1145/2964284.2967215","DOIUrl":"https://doi.org/10.1145/2964284.2967215","url":null,"abstract":"This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixed-length music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128293471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Multimodal Gamified Platform for Real-Time User Feedback in Sports Performance 一个多模式的游戏化平台，用于实时用户反馈运动表现

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973815

David S. Monaghan, Freddie Honohan, A. Ahmadi, T. McDaniel, Ramin Tadayon, Ajay Karpur, Kieran Moran, N. O’Connor, S. Panchanathan

引用次数: 3

A Domain Robust Approach For Image Dataset Construction 一种图像数据集构建的领域鲁棒方法

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967213

Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang

引用次数: 38

Cross-batch Reference Learning for Deep Classification and Retrieval 深度分类与检索的跨批参考学习

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964324

Huei-Fang Yang, Kevin Lin, Chu-Song Chen

{"title":"Cross-batch Reference Learning for Deep Classification and Retrieval","authors":"Huei-Fang Yang, Kevin Lin, Chu-Song Chen","doi":"10.1145/2964284.2964324","DOIUrl":"https://doi.org/10.1145/2964284.2964324","url":null,"abstract":"Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128671067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge AVEC 2016总结:抑郁，情绪和情绪识别研讨会和挑战

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2980532

M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, R. Cowie, M. Pantic

引用次数: 100