Proceedings of the 21st ACM international conference on Multimedia最新文献

筛选
英文 中文
Online multimodal deep similarity learning with application to image retrieval 在线多模态深度相似学习及其在图像检索中的应用
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502112
Pengcheng Wu, S. Hoi, Hao Xia, P. Zhao, Dayong Wang, C. Miao
{"title":"Online multimodal deep similarity learning with application to image retrieval","authors":"Pengcheng Wu, S. Hoi, Hao Xia, P. Zhao, Dayong Wang, C. Miao","doi":"10.1145/2502081.2502112","DOIUrl":"https://doi.org/10.1145/2502081.2502112","url":null,"abstract":"Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86132257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
Stereotime: a wireless 2D and 3D switchable video communication system 立体时间:一个无线2D和3D切换视频通信系统
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502275
You Yang, Qiong Liu, Yue Gao, Binbin Xiong, Li Yu, Huanbo Luan, R. Ji, Q. Tian
{"title":"Stereotime: a wireless 2D and 3D switchable video communication system","authors":"You Yang, Qiong Liu, Yue Gao, Binbin Xiong, Li Yu, Huanbo Luan, R. Ji, Q. Tian","doi":"10.1145/2502081.2502275","DOIUrl":"https://doi.org/10.1145/2502081.2502275","url":null,"abstract":"Mobile 3D video communication, especially with 2D and 3D compatible, is a new paradigm for both video communication and 3D video processing. Current techniques face challenges in mobile devices when bundled constraints such as computation resource and compatibility should be considered. In this work, we present a wireless 2D and 3D switchable video communication to handle the previous challenges, and name it as Stereotime. The methods of Zig-Zag fast object segmentation, depth cues detection and merging, and texture-adaptive view generation are used for 3D scene reconstruction. We show the functionalities and compatibilities on 3D mobile devices in WiFi network environment.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88400266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Jiku director: a mobile video mashup system Jiku导演:一个移动视频混搭系统
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502277
Duong-Trung-Dung Nguyen, M. Saini, Vu-Thanh Nguyen, Wei Tsang Ooi
{"title":"Jiku director: a mobile video mashup system","authors":"Duong-Trung-Dung Nguyen, M. Saini, Vu-Thanh Nguyen, Wei Tsang Ooi","doi":"10.1145/2502081.2502277","DOIUrl":"https://doi.org/10.1145/2502081.2502277","url":null,"abstract":"In this technical demonstration, we demonstrate a Web-based application called Jiku Director that automatically creates a mashup video from event videos uploaded by users. The system runs an algorithm that considers view quality (shakiness, tilt, occlusion), video quality (blockiness, contrast, sharpness, illumination, burned pixels), and spatial-temporal diversity (shot angles, shot lengths) to create a mashup video with smooth shot transitions while covering the event from different perspectives.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91551399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Action recognition using invariant features under unexampled viewing conditions 在未示例的观看条件下使用不变特征的动作识别
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508126
Litian Sun, K. Aizawa
{"title":"Action recognition using invariant features under unexampled viewing conditions","authors":"Litian Sun, K. Aizawa","doi":"10.1145/2502081.2508126","DOIUrl":"https://doi.org/10.1145/2502081.2508126","url":null,"abstract":"A great challenge in real-world applications of action recognition is the lack of sufficient label information because of variance in the recording viewpoint and differences between individuals. A system that can adapt itself according to these variances is required for practical use. We present a generic method for extracting view-invariant features from skeleton joints. These view-invariant features are further refined using a stacked, compact autoencoder. To model the challenge of real-world applications, two unexampled test settings (NewView and NewPerson) are used to evaluate the proposed method. Experimental results with these test settings demonstrate the effectiveness of our method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91104638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Analysis and forecasting of trending topics in online media streams 在线媒体流中趋势话题的分析和预测
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502117
Tim Althoff, Damian Borth, Jörn Hees, A. Dengel
{"title":"Analysis and forecasting of trending topics in online media streams","authors":"Tim Althoff, Damian Borth, Jörn Hees, A. Dengel","doi":"10.1145/2502081.2502117","DOIUrl":"https://doi.org/10.1145/2502081.2502117","url":null,"abstract":"Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems. Correctly utilizing trending topics requires a better understanding of their various characteristics in different social media streams. To this end, we present the first comprehensive study across three major online and social media streams, Twitter, Google, and Wikipedia, covering thousands of trending topics during an observation period of an entire year. Our results indicate that depending on one's requirements one does not necessarily have to turn to Twitter for information about current events and that some media streams strongly emphasize content of specific categories. As our second key contribution, we further present a novel approach for the challenging task of forecasting the life cycle of trending topics in the very moment they emerge. Our fully automated approach is based on a nearest neighbor forecasting technique exploiting our assumption that semantically similar topics exhibit similar behavior. We demonstrate on a large-scale dataset of Wikipedia page view statistics that forecasts by the proposed approach are about 9-48k views closer to the actual viewing statistics compared to baseline methods and achieve a mean average percentage error of 45-19% for time periods of up to 14 days.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73911376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Tell me what happened here in history 告诉我历史上这里发生了什么
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502272
Jia Chen, Qin Jin, Weipeng Zhang, Shenghua Bao, Zhong Su, Yong Yu
{"title":"Tell me what happened here in history","authors":"Jia Chen, Qin Jin, Weipeng Zhang, Shenghua Bao, Zhong Su, Yong Yu","doi":"10.1145/2502081.2502272","DOIUrl":"https://doi.org/10.1145/2502081.2502272","url":null,"abstract":"This demo shows our system that takes a landmark image as input, recognizes the landmark from the image and returns historical events of the landmark with related photos. Different from existing landmark related researches, we focus on the temporal dimension of a landmark. Our system automatically recognizes the landmark, shows historical events chronologically and provides detailed photos for the events. To build these functions, we fuse information from multiple online resources.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77231293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards a comprehensive computational model foraesthetic assessment of videos 面向视频审美评价的综合计算模型
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508119
Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah
{"title":"Towards a comprehensive computational model foraesthetic assessment of videos","authors":"Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah","doi":"10.1145/2502081.2508119","DOIUrl":"https://doi.org/10.1145/2502081.2508119","url":null,"abstract":"In this paper we propose a novel aesthetic model emphasizing psycho-visual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1,200 pre-trained visual classifiers) on a given frame, that invoke specific sentiments such as \"colorful clouds\", \"smiling face\" etc. and collect the classifier responses as frame-level statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -- dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74239577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Multimedia framed 多媒体框架
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2512088
E. Churchill
{"title":"Multimedia framed","authors":"E. Churchill","doi":"10.1145/2502081.2512088","DOIUrl":"https://doi.org/10.1145/2502081.2512088","url":null,"abstract":"Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76916829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable training with approximate incremental laplacian eigenmaps and PCA 基于近似增量拉普拉斯特征映射和PCA的可扩展训练
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508124
Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris
{"title":"Scalable training with approximate incremental laplacian eigenmaps and PCA","authors":"Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris","doi":"10.1145/2502081.2508124","DOIUrl":"https://doi.org/10.1145/2502081.2508124","url":null,"abstract":"The paper describes the approach, the experimental settings, and the results obtained by the proposed methodology at the ACM Yahoo! Multimedia Grand Challenge. Its main contribution is the use of fast and efficient features with a highly scalable semi-supervised learning approach, the Approximate Laplacian Eigenmaps (ALEs), and its extension, by computing the test set incrementally for learning concepts in time linear to the number of images (both labelled and unlabelled). A combination of two local visual features combined with the VLAD feature aggregation method and PCA is used to improve the efficiency and time complexity. Our methodology achieves somewhat better accuracy compared to the baseline (linear SVM) in small training sets, but improves the performance as the training data increase. Performing ALE fusion on a training set of 50K/concept resulted in a MiAP score of 0.4223, which was among the highest scores of the proposed approach.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"286 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77080075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Spatio-temporal fisher vector coding for surveillance event detection 监测事件检测的时空fisher矢量编码
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502155
Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti
{"title":"Spatio-temporal fisher vector coding for surveillance event detection","authors":"Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti","doi":"10.1145/2502081.2502155","DOIUrl":"https://doi.org/10.1145/2502081.2502155","url":null,"abstract":"We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"31 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76513765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信