{"title":"Session details: Plenary talk 1","authors":"N. Babaguchi","doi":"10.1145/3246389","DOIUrl":"https://doi.org/10.1145/3246389","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuming Fang, Weisi Lin, Zhenzhong Chen, Chia-Ming Tsai, Chia-Wen Lin
{"title":"Video saliency detection in the compressed domain","authors":"Yuming Fang, Weisi Lin, Zhenzhong Chen, Chia-Ming Tsai, Chia-Wen Lin","doi":"10.1145/2393347.2396290","DOIUrl":"https://doi.org/10.1145/2393347.2396290","url":null,"abstract":"Saliency detection is widely used to extract the regions of interest in images. Many saliency detection models have been proposed for videos in the uncompressed domain. However, videos are always stored in the compressed domain such as MPEG2, H.264, MPEG4 Visual, etc. In this study, we propose a video saliency detection model based on feature contrast in the compressed domain. Four features of luminance, color, texture and motion are extracted from DCT coefficients and motion vectors in the video bitstream. The static saliency map of video frames is calculated based on the luminance, color and texture features, while the motion saliency map for video frames is computed by motion feature. The final saliency map for video frames is obtained through combining the static saliency map and motion saliency map. Experimental results show good performance of the proposed video saliency detection model in the compressed domain.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132077290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MOGAT: a cloud-based mobile game system with auditory training for children with cochlear implants","authors":"Yinsheng Zhou, T. K. Monserrat, Ye Wang","doi":"10.1145/2393347.2396457","DOIUrl":"https://doi.org/10.1145/2393347.2396457","url":null,"abstract":"Musical auditory habilitation is an essential process in adapting cochlear implant recipients to the musical hearing context provided by cochlear implants. However, due to the cost and time limitation, it is impossible for hearing healthcare professionals to provide intensive and extensive musical auditory habilitation for every cochlear implant recipient. In order to provide an efficient and cost-effective musical auditory training for children with cochlear implants, we designed and developed MObile Games with Auditory Training (MOGAT) on off-the-shelf mobile devices. MOGAT includes three intuitive and interesting mobile games for training pitch perception and production, and a cloud-based web service for music therapists to support and evaluate individual habilitation. We demonstrate MOGAT for enhancing musical habilitation for children with cochlear implants.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132230179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dinner of Luciérnaga: an interactive play with iPhone app in theater","authors":"Yu-Chuan Tseng, Yi-Ching Huang, Kuan-Ying Wu, Chi-Ping Chin","doi":"10.1145/2393347.2393426","DOIUrl":"https://doi.org/10.1145/2393347.2393426","url":null,"abstract":"Interactive digital art in the field of performance is emerging as an increasingly important form of artistic expression in Taiwan. Dinner of Luciérnaga is an interdisciplinary project produced by more than ten talented members which include the director, dancer, choreographer, artist, interactive designer, sound designer, iPhone app engineer, image processing designer, stage designer and light designer. The goal of this project is to create new modes of interactive participation between the performers and audience through the use of an innovative iPhone application that links dancer to audience and audience to dancer. The application not only plays a key role in connecting the audience and dancer, but also uses an interesting sound generation application that enhances the spectators' experience. It creates and shares special interactive experiences. Dinner of Luciérnaga is a story about the relationship of light and human in the digital era. It is a stunning performance in visuals and interactive process with focus on new interface that are put into use in authentic environments for validation by audience. In this paper, we will discuss our artistic motivation, the development of the digitally interactive performances, and our process for creating the digitally interactive performance entitled Dinner of Luciérnaga.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134446514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collective search and recommendation in social media","authors":"J. Sang","doi":"10.1145/2393347.2396509","DOIUrl":"https://doi.org/10.1145/2393347.2396509","url":null,"abstract":"This PhD thesis proposal is focused on proposing solutions to the problem of collective search and recommendation in social media. User and data are two fundamental elements under social media environment. To cope with the semantic gap between social media data and semantic meaning, and the complexity of user intent and requirements, we propose to conduct research on three stages: (1) multimedia content analysis; (2) user understanding and (3)collective search and recommendation. We address the large-scale, multi-modal and heterogeneous characteristics of social media analysis by developing methodology from factor analysis, generative topic model and collaborative filtering. Progresses and advances along the three research lines have been presented, with future directions and open discussions concluded in the end.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127556570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video hyperlinking: libraries and tools for threading and visualizing large video collection","authors":"Lei Pang, Wei Zhang, Hung-Khoon Tan, C. Ngo","doi":"10.1145/2393347.2396520","DOIUrl":"https://doi.org/10.1145/2393347.2396520","url":null,"abstract":"While HTML documents could be effortlessly hyperlinked by markup tags, creation of the hyperlinks for multimedia objects is by no means easy due to the involvement of various visual processing units and intensive computational overhead. This paper introduces an open source, named VIREO-VH, which provides end-to-end support for creating hyperlinks to thread and visualize collections of videos. The software components include video pre-processing, bag-of-words based inverted file indexing for scalable near-duplicate keyframe search, localization of partial near-duplicate segments, and galaxy visualization of video collection. The open source has been internally used by VIREO research team since 2007, and was evolved over years based on experiences through developing various multimedia applications.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132855733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing cross-group traffic with a cooperative streaming architecture","authors":"Zhijie Shen, Roger Zimmermann","doi":"10.1145/2393347.2396373","DOIUrl":"https://doi.org/10.1145/2393347.2396373","url":null,"abstract":"Cooperative approaches, such as P2P networks, have demonstrated their effectiveness in video delivery. However, with underlay structure considered, it is still possible to further improve traffic efficiency. In this paper, we discuss the problem of localizing the traffic traversal across peer groups, which are partitioned according to underlay characteristics. We first provide three concrete examples to demonstrate this common challenge, which we theoretically formulate afterwards. Finally, we propose a ring overlay approach, which performs excellently to solve the problem, while tolerating peer dynamics and supporting peer heterogeneity.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132682685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributional semantics with eyes: using image analysis to improve computational representations of word meaning","authors":"Elia Bruni, J. Uijlings, Marco Baroni, N. Sebe","doi":"10.1145/2393347.2396422","DOIUrl":"https://doi.org/10.1145/2393347.2396422","url":null,"abstract":"The current trend in image analysis and multimedia is to use information extracted from text and text processing techniques to help vision-related tasks, such as automated image annotation and generating semantically rich descriptions of images. In this work, we claim that image analysis techniques can \"return the favor\" to the text processing community and be successfully used for a general-purpose representation of word meaning. We provide evidence that simple low-level visual features can enrich the semantic representation of word meaning with information that cannot be extracted from text alone, leading to improvement in the core task of estimating degrees of semantic relatedness between words, as well as providing a new, perceptually-enhanced angle on word semantics. Additionally, we show how distinguishing between a concept and its context in images can improve the quality of the word meaning representations extracted from images.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115471020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Right buddy makes the difference: an early exploration of social relation analysis in multimedia applications","authors":"J. Sang, Changsheng Xu","doi":"10.1145/2393347.2393358","DOIUrl":"https://doi.org/10.1145/2393347.2393358","url":null,"abstract":"Social media is becoming popular these days, where user necessarily interacts with each other to form social networks. Influence network, as one special case of social network, has been recognized as significantly impacting social activities and user decisions. We emphasize in this paper that the inter-user influence is essentially topic-sensitive, as for different tasks users tend to trust different influencers and be influenced most by them. While existing research focuses on global influence modeling and applies to text-based networks, this work investigates the problem of topic-sensitive influence modeling in the multimedia domain. We propose a multi-modal probabilistic model, considering both users' textual annotation and uploaded visual image. This model is capable of simultaneously extracting user topic distributions and topic-sensitive influence strengths. By identifying the topic-sensitive influencer, we are able to conduct applications like collective search and collaborative recommendation. A risk minimization-based general framework for personalized image search is further presented, where the image search task is transferred to measure the distance of image and personalized query language models. The framework considers the noisy tag issue and enables easy incorporation of social influence. We have conducted experiments on a large-scale Flickr dataset. Qualitative as well as quantitative evaluation results have validated the effectiveness of the topic-sensitive influencer mining model, and demonstrated the advantage of incorporating topic-sensitive influence in personalized image search and topic-based image recommendation.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115722931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video object segmentation with shortest path","authors":"Bao Zhang, Handong Zhao, Xiaochun Cao","doi":"10.1145/2393347.2396316","DOIUrl":"https://doi.org/10.1145/2393347.2396316","url":null,"abstract":"Unsupervised video object segmentation is to automatically segment the foreground object in the video without any prior knowledge. This paper proposes an object-level method to segment foreground object, while existing methods are normally based on low level information. We firstly find all the object-like regions. Then based on the corresponding map between the successive frames, the video segmentation problem is converted to graph model one. Rather than adopting TRW-S which might result in a local optimal solution, a shortest path algorithm is explored to get a globally optimum solution. Compared with the state-of-the-art object-level method, our method not only guarantees the continuity of segmentation result but also works well even under the big disturbance of fast motion object in the background. The experimental results on two open datasets (SegTrack and Berkeley Motion Segmentation Dataset) and video sequences captured by ourselves demonstrate the effectiveness of our method.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}