Zhigang Ma, Yi Yang, Yang Cai, N. Sebe, Alexander Hauptmann
{"title":"Knowledge adaptation for ad hoc multimedia event detection with few exemplars","authors":"Zhigang Ma, Yi Yang, Yang Cai, N. Sebe, Alexander Hauptmann","doi":"10.1145/2393347.2393414","DOIUrl":"https://doi.org/10.1145/2393347.2393414","url":null,"abstract":"Multimedia event detection (MED) has a significant impact on many applications. Though video concept annotation has received much research effort, video event detection remains largely unaddressed. Current research mainly focuses on sports and news event detection or abnormality detection in surveillance videos. Our research on this topic is capable of detecting more complicated and generic events. Moreover, the curse of reality, i.e., precisely labeled multimedia content is scarce, necessitates the study on how to attain respectable detection performance using only limited positive examples. Research addressing these two aforementioned issues is still in its infancy. In light of this, we explore Ad Hoc MED, which aims to detect complicated and generic events by using few positive examples. To the best of our knowledge, our work makes the first attempt on this topic. As the information from these few positive examples is limited, we propose to infer knowledge from other multimedia resources to facilitate event detection. Experiments are performed on real-world multimedia archives consisting of several challenging events. The results show that our approach outperforms several other detection algorithms. Most notably, our algorithm outperforms SVM by 43% and 14% comparatively in Average Precision when using Gaussian and Χ2 kernel respectively.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125636179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-aware mobile music recommendation for daily activities","authors":"Xinxi Wang, David S. Rosenblum, Ye Wang","doi":"10.1145/2393347.2393368","DOIUrl":"https://doi.org/10.1145/2393347.2393368","url":null,"abstract":"Existing music recommendation systems rely on collaborative filtering or content-based technologies to satisfy users' long-term music playing needs. Given the popularity of mobile music devices with rich sensing and wireless communication capabilities, we present in this paper a novel approach to employ contextual information collected with mobile devices for satisfying users' short-term music playing needs. We present a probabilistic model to integrate contextual information with music content analysis to offer music recommendation for daily activities, and we present a prototype implementation of the model. Finally, we present evaluation results demonstrating good accuracy and usability of the model and prototype.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122546177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlation-based burstiness for logo retrieval","authors":"Jérôme Revaud, Matthijs Douze, C. Schmid","doi":"10.1145/2393347.2396358","DOIUrl":"https://doi.org/10.1145/2393347.2396358","url":null,"abstract":"Detecting logos in photos is challenging. A reason is that logos locally resemble patterns frequently seen in random images. We propose to learn a statistical model for the distribution of incorrect detections output by an image matching algorithm. It results in a novel scoring criterion in which the weight of correlated keypoint matches is reduced, penalizing irrelevant logo detections. In experiments on two very different logo retrieval benchmarks, our approach largely improves over the standard matching criterion as well as other state-of-the-art approaches.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyang Lu, Tao Mei, Jingdong Wang, Jian Zhang, Zhiyong Wang, D. Feng, Jian-Tao Sun, Shipeng Li
{"title":"Browse-to-search","authors":"Shiyang Lu, Tao Mei, Jingdong Wang, Jian Zhang, Zhiyong Wang, D. Feng, Jian-Tao Sun, Shipeng Li","doi":"10.1145/2393347.2396465","DOIUrl":"https://doi.org/10.1145/2393347.2396465","url":null,"abstract":"This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session---it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context---the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus---users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122907994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coulda, woulda, shoulda: 20 years of multimedia opportunities","authors":"K. Nahrstedt, M. Slaney","doi":"10.1145/2393347.2393351","DOIUrl":"https://doi.org/10.1145/2393347.2393351","url":null,"abstract":"The ACM Special Interest Group on Multimedia (SIGMM) is celebrating the 20th anniversary of establishing its premier conference, the ACM International Conference on Multimedia (ACM Multimedia). The panel \"Coulda, Woulda, Shoulda\" is part of the celebration at the ACM Multimedia 2012. The panelists and the audience will discuss the 20 years of multimedia opportunities that our community has seen, took upon and pushed forward to advance the state of the art.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131546047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobile multimedia presentation in self-forming mobile device groups: ad-hoc networks in practice","authors":"K. Collins, N. O’Connor, Gabriel-Miro Muntean","doi":"10.1145/2393347.2396444","DOIUrl":"https://doi.org/10.1145/2393347.2396444","url":null,"abstract":"This demo exhibits a new application of mobile ad-hoc networks, where, a group of mobile devices are connected to allow synchronized presentation of multimedia content. The demo is in the form of an interactive tour. Participants have a mobile device and the tour is led by a guide, who takes the group on an informative tour of a locale. The tour is augmented with the presentation of multimedia content on the devices, highlighting points of interest. Content presentation is controlled by the guide and is synchronized using the ad-hoc network. This is an edutainment application but the underlying technology could be applied elsewhere including educational settings and in entertainment.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127578798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Full paper session 5: person and face analysis","authors":"N. Sebe","doi":"10.1145/3246398","DOIUrl":"https://doi.org/10.1145/3246398","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127746294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai-Yu Tseng, Yen-Liang Lin, Yu-Hsiu Chen, Winston H. Hsu
{"title":"Sketch-based image retrieval on mobile devices using compact hash bits","authors":"Kai-Yu Tseng, Yen-Liang Lin, Yu-Hsiu Chen, Winston H. Hsu","doi":"10.1145/2393347.2396345","DOIUrl":"https://doi.org/10.1145/2393347.2396345","url":null,"abstract":"The advent of touch panels in mobile devices has provided a good platform for mobile sketch search. However, most of the previous sketch image retrieval systems usually adopt an inverted index structure on large-scale image database, which is formidable to be operated in the limited memory of mobile devices. In this paper, we propose a novel approach to address these challenges. First, we effectively utilize distance transform (DT) features to bridge the gap between query sketches and natural images. Then these high-dimensional DT features are further projected to more compact binary hash bits. The experimental results show that our method achieves very competitive retrieval performance with MindFinder approach [3] but only requires much less memory storage (e.g., our method only requires 3% of total memory storage of MindFinder in 2.1 million images). Due to its low consumption of memory, the whole system can independently operate on the mobile devices.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132714427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabrizio Pedersoli, N. Adami, Sergio Benini, R. Leonardi
{"title":"XKin -: eXtendable hand pose and gesture recognition library for kinect","authors":"Fabrizio Pedersoli, N. Adami, Sergio Benini, R. Leonardi","doi":"10.1145/2393347.2396521","DOIUrl":"https://doi.org/10.1145/2393347.2396521","url":null,"abstract":"In this work we provide an open-source framework for Kinect enabling more natural and intuitive hand-gesture communication between human and computer devices. The software package is endowed with useful tools for training the system to work with user-defined postures and gestures. The XKin project is fully implemented in C and freely available at https://github.com/fpeder/XKin under FreeBSD License. Our goal is to encourage contributions from other researchers and developers in building an open and effective system for empowering a natural modality for human-machine interaction.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133691161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motch: an automatic motion type characterization system for sensor-rich videos","authors":"Guanfeng Wang, Beomjoo Seo, Roger Zimmermann","doi":"10.1145/2393347.2396462","DOIUrl":"https://doi.org/10.1145/2393347.2396462","url":null,"abstract":"Camera motion information facilitates higher-level semantic description inference in many video applications, e.g., video retrieval. However, an efficient and accurate methodology for annotating videos with camera motion information is still an elusive goal. In our recent work we have investigated the fusion of captured video with a continuous stream of sensor meta-data. For these so-called sensor-rich videos we present a system, called Motch, which precisely partitions a video document into subshots, automatically characterizes the camera motions and provides video subshot browsing based on an interactive, map-based interface. Moreover, the system computes and presents motion type statistics for each video in real time and renders different subshots distinctively on the map synchronously with the video playback.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130319591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}