{"title":"OmniViewer: Enabling Multi-modal 3D DASH","authors":"Zhenhuan Gao, Chien-Nan Chen, K. Nahrstedt","doi":"10.1145/2733373.2807971","DOIUrl":"https://doi.org/10.1145/2733373.2807971","url":null,"abstract":"This paper presents OmniViewer, a multi-modal 3D video streaming system based on Dynamic Adaptive Streaming over HTTP (DASH) standard. OmniViewer allows users to view arbitrary side of a performer by choosing the view angle from 0° to 360°. Besides, according to the current available bandwidth, it can also adaptively change the bitrate of rendered 3D video for both smooth and high-quality view rendering. Finally, OmniViewer extends traditional DASH implementation to support multi-modal data streaming besides video and audio.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vocabulary Expansion Using Word Vectors for Video Semantic Indexing","authors":"Nakamasa Inoue, Koichi Shinoda","doi":"10.1145/2733373.2806347","DOIUrl":"https://doi.org/10.1145/2733373.2806347","url":null,"abstract":"We propose vocabulary expansion for video semantic indexing. From many semantic concept detectors obtained by using training data, we make detectors for concepts not included in training data. First, we introduce Mikolov's word vectors to represent a word by a low-dimensional vector. Second, we represent a new concept by a weighted sum of concepts in training data in the word vector space. Finally, we use the same weighting coefficients for combining detectors to make a new detector. In our experiments, we evaluate our methods on the TRECVID Video Semantic Indexing (SIN) Task. We train our models with Google News text documents and ImageNET images to generate new semantic detectors for SIN task. We show that our method performs as well as SVMs trained with 100 TRECVID ex- ample videos.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116911740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Houwen Peng, Kai Li, Bing Li, Haibin Ling, Weihua Xiong, Weiming Hu
{"title":"Predicting Image Memorability by Multi-view Adaptive Regression","authors":"Houwen Peng, Kai Li, Bing Li, Haibin Ling, Weihua Xiong, Weiming Hu","doi":"10.1145/2733373.2806303","DOIUrl":"https://doi.org/10.1145/2733373.2806303","url":null,"abstract":"The images we encounter throughout our lives make different impressions on us: Some are remembered at first glance, while others are forgotten. This phenomenon is caused by the intrinsic memorability of images revealed by recent studies [5,6]. In this paper, we address the issue of automatically estimating the memorability of images by proposing a novel multi-view adaptive regression (MAR) model. The MAR model provides an effective mapping of visual features to memorability scores by taking advantage of robust feature selection and multiple feature integration. It consists of three major components: an adaptive loss function, an adaptive regularization and a multi-view modeling strategy. Moreover, we design an alternating direction method (ADM) optimization algorithm to solve the proposed objective function. Experimental results on the MIT benchmark dataset show the superiority of the proposed model compared with existing image memorability prediction methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127061716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"About Events, Objects, and their Relationships: Human-centered Event Understanding from Multimedia","authors":"A. Scherp, V. Mezaris, B. Ionescu, F. D. Natale","doi":"10.1145/2733373.2806413","DOIUrl":"https://doi.org/10.1145/2733373.2806413","url":null,"abstract":"HuEvent'15 is a continuation of previous year's successful workshop on events in multimedia. It focuses on the human-centered aspects of understanding events from multimedia content. This includes the notion of objects and their relation to events. The workshop brings together researchers from the different areas in multimedia and beyond that are interested in understanding the concept of events.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127130892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Semantic Correlation of Web Images and Text with Mixture of Local Linear Mappings","authors":"Youtian Du, Kai Yang","doi":"10.1145/2733373.2806331","DOIUrl":"https://doi.org/10.1145/2733373.2806331","url":null,"abstract":"This paper proposes a new approach, called mixture of local linear mappings (MLLM), to the modeling of semantic correlation between web images and text. We consider that close examples generally represent a uniform concept and can be supposed to be locally transformed based on a linear mapping into the feature space of another modality. Thus, we use a mixture of local linear transformations, each local component being constrained by a neighborhood model into a finite local space, instead of a more complex nonlinear one. To handle the sparseness of data representation, we introduce the constraints of sparseness and non-negativeness into the approach. MLLM is with good interpretability due to its explicit closed form and concept-related local components, and it avoids the determination of capacity that is often considered for nonlinear transformations. Experimental results demonstrate the effectiveness of the proposed approach.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Printing and Camera Mapping: Dialectic of Virtual and Reality","authors":"He-Lin Luo, I-Chun Chen, Y. Hung","doi":"10.1145/2733373.2808105","DOIUrl":"https://doi.org/10.1145/2733373.2808105","url":null,"abstract":"Projection Mapping, the superimposing of virtual images upon actual objects, is already extensively used in performance arts. Applications of it are already quite mature, therefore, here we wish to achieve the opposite, or specifically speaking, the superimposing of actual objects into virtual images. This method of reverse superimposition is called \"camera mapping.\" Through cameras, camera mapping captures actual objects, and introduces them into a virtual world. Then using superimposition, this allows for actual objects to be rendered as virtual objects. However, the actual objects here must have refined shapes so that they may be superimposed back into the camera. Through the proliferation of 3D printing, virtual 3D models in computers can be created in reality, thereby providing a framework for the limits and demands of \"camera mapping.\" The new media artwork Digital Buddha combines 3D Printing and camera mapping. This work was created by 3-D deformable modeling through a computer, then transforming the model into a sculpture using 3D printing, and then remapping the materially produced sculpture back into the camera. Finally, it uses the already known algorithm to convert the model back into that of the original non-deformed sculpture. From this creation project, in the real world, audiences will see a deformed, abstract sculpture; and in the virtual world, through camera mapping, they will see a concrete sculpture (Buddha). In its representation, this piece of work pays homage to the work TV Buddha produced by video art master Nam June Paik. Using the influence television possesses over people, this work extends into the most important concepts of the digital era, \"coding\" and \"decoding,\" simultaneously addressing the shock and insecurity people in the digital era feel toward images.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125086048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David S. Monaghan, N. O’Connor, A. Cleary, D. Connolly
{"title":"The Real Time Rolling Shutter","authors":"David S. Monaghan, N. O’Connor, A. Cleary, D. Connolly","doi":"10.1145/2733373.2808110","DOIUrl":"https://doi.org/10.1145/2733373.2808110","url":null,"abstract":"From an early age children are often told either, you are creative you should do art but stay away from science and maths. Or that you are mathematical you should do science but you're not that creative. Compounding this there also exist some traditional barriers of artistic rhetoric that say, \"don't touch, don't think and don't be creative, we've already done that for you, you can just look...\". The Real Time Rolling Shutter is part of a collaborative Art/Science partnership whose core tenets are in complete contrast to this. The Art/Science exhibitions we have created have invited the public to become part of the exhibition by utilising augmented digital mirrors, Kinects, feed-back camera and projector systems and augmented reality perception helmets. The fundamental underlying principles we are trying to adhere to are to foster curiosity, intrigue, wonderment and amazement and we endeavour to draw the audience into the interactive nature of our exhibits and exclaim to everyone that you can be what ever you chose to be, and that everyone can be creative, everyone can be an artist, everyone can be a scientist... all it takes is an inquisitive mind, so come and explore the real-time rolling shutter and be creative.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125174206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Panel 2","authors":"Yung-Hsiang Lu","doi":"10.1145/3257791","DOIUrl":"https://doi.org/10.1145/3257791","url":null,"abstract":"","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126064765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Deep Features For MSR-bing Information Retrieval Challenge","authors":"Qiang Song, Sixie Yu, Cong Leng, Jiaxiang Wu, Qinghao Hu, Jian Cheng","doi":"10.1145/2733373.2809928","DOIUrl":"https://doi.org/10.1145/2733373.2809928","url":null,"abstract":"Two tasks have been put forward in the MSR-bing Grand Challenge 2015. To address the information retrieval task, we raise and integrate a series of methods with visual features obtained by convolution neural network (CNN) models. In our experiments, we discover that the ranking strategies of Hierarchical clustering and PageRank methods are mutually complementary. Another task is fine-grained classification. In contrast to basic-level recognition, fine-grained classification aims to distinguish between different breeds or species or product models, and often requires distinctions that must be conditioned on the object pose for reliable identification. Current state-of-the-art techniques rely heavily upon the use of part annotations, while the bing datasets suffer both abundance of part annotations and dirty background. In this paper, we propose a CNN-based feature representation for visual recognition only using image-level information. Our CNN model is pre-trained on a collection of clean datasets and fine-tuned on the bing datasets. Furthermore, a multi-scale training strategy is adopted by simply resizing the input images into different scales and then merging the soft-max posteriors. We then implement our method into a unified visual recognition system on Microsoft cloud service. Finally, our solution achieved top performance in both tasks of the contest","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Shah, A. Shaikh, Yi Yu, Wenjing Geng, Roger Zimmermann, Gangshan Wu
{"title":"EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media","authors":"R. Shah, A. Shaikh, Yi Yu, Wenjing Geng, Roger Zimmermann, Gangshan Wu","doi":"10.1145/2733373.2809932","DOIUrl":"https://doi.org/10.1145/2733373.2809932","url":null,"abstract":"Due to the ubiquitous availability of smartphones and digital cameras, the number of photos/videos online has increased rapidly. Therefore, it is challenging to efficiently browse multimedia content and obtain a summary of an event from a large collection of photos/videos aggregated in social media sharing platforms such as Flickr and Instagram. To this end, this paper presents the EventBuilder system that enables people to automatically generate a summary for a given event in real-time by visualizing different social media such as Wikipedia and Flickr. EventBuilder has two novel characteristics: (i) leveraging Wikipedia as event background knowledge to obtain more contextual information about an input event, and (ii) visualizing an interesting event in real-time with a diverse set of social media activities. According to our initial experiments on the YFCC100M dataset from Flickr, the proposed algorithm efficiently summarizes knowledge structures based on the metadata of photos/videos and Wikipedia articles.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}