Proceedings of the 21st ACM international conference on Multimedia最新文献_第5页

Online multimodal deep similarity learning with application to image retrieval 在线多模态深度相似学习及其在图像检索中的应用

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502112

Pengcheng Wu, S. Hoi, Hao Xia, P. Zhao, Dayong Wang, C. Miao

{"title":"Online multimodal deep similarity learning with application to image retrieval","authors":"Pengcheng Wu, S. Hoi, Hao Xia, P. Zhao, Dayong Wang, C. Miao","doi":"10.1145/2502081.2502112","DOIUrl":"https://doi.org/10.1145/2502081.2502112","url":null,"abstract":"Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86132257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 169

Stereotime: a wireless 2D and 3D switchable video communication system 立体时间:一个无线2D和3D切换视频通信系统

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502275

You Yang, Qiong Liu, Yue Gao, Binbin Xiong, Li Yu, Huanbo Luan, R. Ji, Q. Tian

引用次数: 5

Jiku director: a mobile video mashup system Jiku导演:一个移动视频混搭系统

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502277

Duong-Trung-Dung Nguyen, M. Saini, Vu-Thanh Nguyen, Wei Tsang Ooi

引用次数: 8

Action recognition using invariant features under unexampled viewing conditions 在未示例的观看条件下使用不变特征的动作识别

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508126

Litian Sun, K. Aizawa

引用次数: 14

Analysis and forecasting of trending topics in online media streams 在线媒体流中趋势话题的分析和预测

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502117

Tim Althoff, Damian Borth, Jörn Hees, A. Dengel

{"title":"Analysis and forecasting of trending topics in online media streams","authors":"Tim Althoff, Damian Borth, Jörn Hees, A. Dengel","doi":"10.1145/2502081.2502117","DOIUrl":"https://doi.org/10.1145/2502081.2502117","url":null,"abstract":"Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems. Correctly utilizing trending topics requires a better understanding of their various characteristics in different social media streams. To this end, we present the first comprehensive study across three major online and social media streams, Twitter, Google, and Wikipedia, covering thousands of trending topics during an observation period of an entire year. Our results indicate that depending on one's requirements one does not necessarily have to turn to Twitter for information about current events and that some media streams strongly emphasize content of specific categories. As our second key contribution, we further present a novel approach for the challenging task of forecasting the life cycle of trending topics in the very moment they emerge. Our fully automated approach is based on a nearest neighbor forecasting technique exploiting our assumption that semantically similar topics exhibit similar behavior. We demonstrate on a large-scale dataset of Wikipedia page view statistics that forecasts by the proposed approach are about 9-48k views closer to the actual viewing statistics compared to baseline methods and achieve a mean average percentage error of 45-19% for time periods of up to 14 days.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73911376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Tell me what happened here in history 告诉我历史上这里发生了什么

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502272

Jia Chen, Qin Jin, Weipeng Zhang, Shenghua Bao, Zhong Su, Yong Yu

引用次数: 2

Towards a comprehensive computational model foraesthetic assessment of videos 面向视频审美评价的综合计算模型

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508119

Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah

{"title":"Towards a comprehensive computational model foraesthetic assessment of videos","authors":"Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah","doi":"10.1145/2502081.2508119","DOIUrl":"https://doi.org/10.1145/2502081.2508119","url":null,"abstract":"In this paper we propose a novel aesthetic model emphasizing psycho-visual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1,200 pre-trained visual classifiers) on a given frame, that invoke specific sentiments such as \"colorful clouds\", \"smiling face\" etc. and collect the classifier responses as frame-level statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -- dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74239577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Multimedia framed 多媒体框架

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2512088

E. Churchill

{"title":"Multimedia framed","authors":"E. Churchill","doi":"10.1145/2502081.2512088","DOIUrl":"https://doi.org/10.1145/2502081.2512088","url":null,"abstract":"Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76916829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable training with approximate incremental laplacian eigenmaps and PCA 基于近似增量拉普拉斯特征映射和PCA的可扩展训练

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508124

Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris

引用次数: 8

Spatio-temporal fisher vector coding for surveillance event detection 监测事件检测的时空fisher矢量编码

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502155

Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti

{"title":"Spatio-temporal fisher vector coding for surveillance event detection","authors":"Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti","doi":"10.1145/2502081.2502155","DOIUrl":"https://doi.org/10.1145/2502081.2502155","url":null,"abstract":"We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"31 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76513765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16