Proceedings of the 24th ACM international conference on Multimedia最新文献

筛选
英文 中文
Local Diffusion Map Signature for Symmetry-aware Non-rigid Shape Correspondence 对称感知非刚性形状对应的局部扩散映射签名
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967277
M. Wang, Yi Fang
{"title":"Local Diffusion Map Signature for Symmetry-aware Non-rigid Shape Correspondence","authors":"M. Wang, Yi Fang","doi":"10.1145/2964284.2967277","DOIUrl":"https://doi.org/10.1145/2964284.2967277","url":null,"abstract":"Identifying accurate correspondences information among different shapes is of great importance in shape analysis such as shape registration, segmentation and retrieval. This paper aims to develop a paradigm to address the challenging issues posed by shape structural variation and symmetry ambiguity. Specifically, the proposed research developed a novel shape signature based on local diffusion map on 3D surface, which is used to identify the shape correspondence through graph matching process. The developed shape signature, named local diffusion map signature (LDMS), is obtained by projecting heat diffusion distribution on 3D surface into 2D images along the surface normal direction with orientation determined by gradients of heat diffusion field. The local diffusion map signature is able to capture the concise geometric essence that is deformation-insensitive and symmetry-aware. Experimental results on 3D shape correspondence demonstrate the superior performance of our proposed method over other state-of-the-art techniques in identifying correspondences for non-rigid shapes with symmetry ambiguity.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124319830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multimedia and Medicine: Teammates for Better Disease Detection and Survival 多媒体和医学:更好的疾病检测和生存的队友
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2976760
M. Riegler, M. Lux, C. Griwodz, C. Spampinato, T. Lange, S. Eskeland, Konstantin Pogorelov, Wallapak Tavanapong, P. Schmidt, C. Gurrin, Dag Johansen, H. Johansen, P. Halvorsen
{"title":"Multimedia and Medicine: Teammates for Better Disease Detection and Survival","authors":"M. Riegler, M. Lux, C. Griwodz, C. Spampinato, T. Lange, S. Eskeland, Konstantin Pogorelov, Wallapak Tavanapong, P. Schmidt, C. Gurrin, Dag Johansen, H. Johansen, P. Halvorsen","doi":"10.1145/2964284.2976760","DOIUrl":"https://doi.org/10.1145/2964284.2976760","url":null,"abstract":"Health care has a long history of adopting technology to save lives and improve the quality of living. Visual information is frequently applied for disease detection and assessment, and the established fields of computer vision and medical imaging provide essential tools. It is, however, a misconception that disease detection and assessment are provided exclusively by these fields and that they provide the solution for all challenges. Integration and analysis of data from several sources, real-time processing, and the assessment of usefulness for end-users are core competences of the multimedia community and are required for the successful improvement of health care systems. We have conducted initial investigations into two use cases surrounding diseases of the gastrointestinal (GI) tract, where the detection of abnormalities provides the largest chance of successful treatment if the initial observation of disease indicators occurs before the patient notices any symptoms. Although such detection is typically provided visually by applying an endoscope, we are facing a multitude of new multimedia challenges that differ between use cases. In real-time assistance for colonoscopy, we combine sensor information about camera position and direction to aid in detecting, investigate means for providing support to doctors in unobtrusive ways, and assist in reporting. In the area of large-scale capsular endoscopy, we investigate questions of scalability, performance and energy efficiency for the recording phase, and combine video summarization and retrieval questions for analysis.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121883513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Ad Recommendation for Sponsored Search Engine via Composite Long-Short Term Memory 基于复合长短期记忆的赞助搜索引擎广告推荐
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967254
Dejiang Kong, Fei Wu, Siliang Tang, Yueting Zhuang
{"title":"Ad Recommendation for Sponsored Search Engine via Composite Long-Short Term Memory","authors":"Dejiang Kong, Fei Wu, Siliang Tang, Yueting Zhuang","doi":"10.1145/2964284.2967254","DOIUrl":"https://doi.org/10.1145/2964284.2967254","url":null,"abstract":"Search engine logs contain a large amount of users' click-through data that can be leveraged as implicit indicators of relevance. In this paper we address ad recommendation problem that finding and ranking the most relevant ads with respect to users' search queries. Due to the click sparsity, the conventional methods can hardly model the both inter- and intra-relations among users, queries and ads. We utilize the long-short term memory(LSTM) network to effectively encode two kinds of sequences: the (user, query) sequence and the query word sequence to represent users' query intention in a continuous vector space and decode them as distributions over ads respectively. Further more, we combine these two LSTM networks in an appropriate way to build up a more robust model referred as composite LSTM model(cLSTM) for ad recommendation. We evaluate the proposed cLSTM on real world click-through data set comparing with two baseline methods, the results demonstrate that our proposed model outperforms two baselines and mitigate the click sparsity problem to a certain degree.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"357 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125063011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Image2Text: A Multimodal Image Captioner Image2Text:一个多模态图像Captioner
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973831
Chang Liu, Changhu Wang, F. Sun, Y. Rui
{"title":"Image2Text: A Multimodal Image Captioner","authors":"Chang Liu, Changhu Wang, F. Sun, Y. Rui","doi":"10.1145/2964284.2973831","DOIUrl":"https://doi.org/10.1145/2964284.2973831","url":null,"abstract":"In this work, we showcase the Image2Text system, which is a real-time captioning system that can generate human-level natural language description for any input image. We formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by a convolutional neural networks (CNN) feature, we propose to represent the input image as a sequence of detected objects to serve as the source sequence of the RNN model. Based on the captioning framework, we develop a user-friendly system to automatically generated human-level captions for users. The system also enables users to detect salient objects in an image, and retrieve similar images and corresponding descriptions from a database.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition 基于独立Softmax的深度卷积神经网络用于大规模人脸识别
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2984060
Yue Wu, Jun Yu Li, Yu Kong, Y. Fu
{"title":"Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition","authors":"Yue Wu, Jun Yu Li, Yu Kong, Y. Fu","doi":"10.1145/2964284.2984060","DOIUrl":"https://doi.org/10.1145/2964284.2984060","url":null,"abstract":"In this paper, we present our solution to the MS-Celeb-1M Challenge. This challenge aims to recognize 100k celebrities at the same time. The huge number of celebrities is the bottleneck for training a deep convolutional neural network of which the output is equal to the number of celebrities. To solve this problem, an independent softmax model is proposed to split the single classifier into several small classifiers. Meanwhile, the training data are split into several partitions. This decomposes the large scale training procedure into several medium training procedures which can be solved separately. Besides, a large model is also trained and a simple strategy is introduced to merge the two models. Extensive experiments on the MSR-Celeb-1M dataset demonstrate the superiority of the proposed method. Our solution ranks the first and second in two tracks of the final evaluation.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131497547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Fast 3D Retrieval Algorithm via Class-Statistic and Pair-Constraint Model 一种基于类统计和对约束模型的快速三维检索算法
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967194
Zan Gao, Deyu Wang, Hua Zhang, Yanbing Xue, Guangping Xu
{"title":"A Fast 3D Retrieval Algorithm via Class-Statistic and Pair-Constraint Model","authors":"Zan Gao, Deyu Wang, Hua Zhang, Yanbing Xue, Guangping Xu","doi":"10.1145/2964284.2967194","DOIUrl":"https://doi.org/10.1145/2964284.2967194","url":null,"abstract":"With the development of 3D technologies and devices, 3D model retrieval becomes a hot research topic where multi-view matching algorithms have demonstrated satisfying performance. However, exciting works overlook the common factors among objects in a single class, and they are time consuming in retrieval processing. In this paper, a class-statistics and pair-constraint model (CSPC) method is originally proposed for 3D model retrieval, which is composed of supervised class-based statistics model and pair-constraint object retrieval model. In our CSPC model, we firstly convert view-based distance measure into object-based distance measure without falling in performance, which will advance 3D model retrieval speed. Secondly, the generality of the distribution of each feature dimension in each class is computed to judge category information, and then we further adopt this distribution information to build class models. Finally, an object-based pairwise constraint is introduced on the base of the class-statistic measure, which can remove a lot of false alarm samples in retrieval. Experimental results on ETH, NTU-60, MVRED and PSB 3D datasets show that our method is fast, and its performance is also comparable with the-state-of-the-art algorithms.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115027387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Synchronization among Groups of Spectators for Highlight Detection in Movies 电影中观众群体间的高光检测同步
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967229
Michal Muszynski, Theodoros Kostoulas, Patrizia Lombardo, T. Pun, G. Chanel
{"title":"Synchronization among Groups of Spectators for Highlight Detection in Movies","authors":"Michal Muszynski, Theodoros Kostoulas, Patrizia Lombardo, T. Pun, G. Chanel","doi":"10.1145/2964284.2967229","DOIUrl":"https://doi.org/10.1145/2964284.2967229","url":null,"abstract":"Detection of emotional and aesthetic highlights is a challenge for the affective understanding of movies. Our assumption is that synchronized spectators' physiological and behavioral reactions occur during these highlights. We propose to employ the periodicity score to capture synchronization among groups of spectators' signals. To uncover the periodicity score's capabilities, we compare it with baseline synchronization measures, such as the nonlinear interdependence and the windowed mutual information. The results show that the periodicity score and the pairwise synchronization measures are able to capture different properties of spectators' synchronization, and they indicate the presence of some types of emotional and aesthetic highlights in a movie based on spectators' electro-dermal and acceleration signals.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130730692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Context-aware Image Tweet Modelling and Recommendation 上下文感知图像Tweet建模和推荐
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964291
Tao Chen, Xiangnan He, Min-Yen Kan
{"title":"Context-aware Image Tweet Modelling and Recommendation","authors":"Tao Chen, Xiangnan He, Min-Yen Kan","doi":"10.1145/2964284.2964291","DOIUrl":"https://doi.org/10.1145/2964284.2964291","url":null,"abstract":"While efforts have been made on bridging the semantic gap in image understanding, the in situ understanding of social media images is arguably more important but has had less progress. In this work, we enrich the representation of images in image tweets by considering their social context. We argue that in the microblog context, traditional image features, e.g., low-level SIFT or high-level detected objects, are far from adequate in interpreting the necessary semantics latent in image tweets. To bridge this gap, we move from the images' pixels to their context and propose a context-aware image bf tweet modelling (CITING) framework to mine and fuse contextual text to model such social media images' semantics. We start with tweet's intrinsic contexts, namely, 1) text within the image itself and 2) its accompanying text; and then we turn to the extrinsic contexts: 3) the external web page linked to by the tweet's embedded URL, and 4) the Web as a whole. These contexts can be leveraged to benefit many fundamental applications. To demonstrate the effectiveness our framework, we focus on the task of personalized image tweet recommendation, developing a feature-aware matrix factorization framework that encodes the contexts as a part of user interest modelling. Extensive experiments on a large Twitter dataset show that our proposed method significantly improves performance. Finally, to spur future studies, we have released both the code of our recommendation model and our image tweet dataset.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131171050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
What Makes a Good Movie Trailer?: Interpretation from Simultaneous EEG and Eyetracker Recording 如何制作一部好的电影预告片?:同时进行脑电图和眼动仪记录的解读
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967187
Sidi Liu, Jinglei Lv, Yimin Hou, Ting Shoemaker, Qinglin Dong, Kaiming Li, Tianming Liu
{"title":"What Makes a Good Movie Trailer?: Interpretation from Simultaneous EEG and Eyetracker Recording","authors":"Sidi Liu, Jinglei Lv, Yimin Hou, Ting Shoemaker, Qinglin Dong, Kaiming Li, Tianming Liu","doi":"10.1145/2964284.2967187","DOIUrl":"https://doi.org/10.1145/2964284.2967187","url":null,"abstract":"What makes a good movie trailer? It's a big challenge to answer this question because of the complexity of multimedia in both low level sensory features and high level semantic features. However, human perception and reactivity could be straightforward evidence for evaluation. Modern Electro-encephalography (EEG) technology provides measurement of consequential brain neural activity to external stimuli. Meanwhile, visual perception and attention could be captured and interpreted by Eye Tracking technology. Intuitively, simultaneous EEG and Eye Tracker recording of human audience with multimedia stimuli could bridge the gap between human comprehension and multimedia analysis, and provide a new way for movie trailer evaluation. In this paper, we propose a novel platform to simultaneously record EEG and eye movement for participants with video stimuli by integrating 256-channel EEG, Eye Tracker and video display device as a system. Based on the proposed system a novel experiment has been designed, in which independent and joint features of EEG and Eye tracking data were mined to evaluate the movie trailer. Our analysis has shown interesting features that are corresponding with trailer quality and video shoot changes.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134134140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Multi-pose Facial Expression Recognition Using Transformed Dirichlet Process 基于变换狄利克雷过程的多姿态面部表情识别
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967240
Feifei Zhang, Qi-rong Mao, Ming Dong, Yongzhao Zhan
{"title":"Multi-pose Facial Expression Recognition Using Transformed Dirichlet Process","authors":"Feifei Zhang, Qi-rong Mao, Ming Dong, Yongzhao Zhan","doi":"10.1145/2964284.2967240","DOIUrl":"https://doi.org/10.1145/2964284.2967240","url":null,"abstract":"Driven by recent advances in human-centered computing, Facial Expression Recognition (FER) has attracted significant attention in many applications. In this paper, we propose a novel graphical model, multi-level Transformed Dirichlet Process (ml-TDP), for multi-pose FER. In our approach, pose is explicitly introduced into ml-TDP so that separate training and parameter tuning for each pose is not required. In addition, ml-TDP can learn an intermediate facial expression representation subject to geometric constraints. By sharing the pool of spatially-coherent features over expressions and poses, we provide a scalable solution for multi-pose FER. Extensive experimental result on benchmark facial expression databases shows the superior performance of ml-TDP.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133580252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信