{"title":"Eventscapes: visualizing events over time with emotive facets","authors":"Brett Adams, Dinh Q. Phung, S. Venkatesh","doi":"10.1145/2072298.2072044","DOIUrl":"https://doi.org/10.1145/2072298.2072044","url":null,"abstract":"The scale and dynamicity of social media, and interaction between traditional news sources and online communities, has created challenges to information retrieval approaches. Users may have no clear information need or be unable to express it in the appropriate idiom, requiring instead to be oriented in an unfamiliar domain, to explore and learn. We present a novel data-driven visualization, termed Eventscape, that combines time, visual media, mood, and controversy. Formative evaluation highlights the value of emotive facets for rapid evaluation of mixed news and social media topics, and a role for such visualizations as pre-cursors to deeper search.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding images with natural sentences","authors":"Y. Ushiku, T. Harada, Y. Kuniyoshi","doi":"10.1145/2072298.2072417","DOIUrl":"https://doi.org/10.1145/2072298.2072417","url":null,"abstract":"We propose a novel system which generates sentential captions for general images. For people to use numerous images effectively on the web, technologies must be able to explain image contents and must be capable of searching for data that users need. Moreover, images must be described with natural sentences based not only on the names of objects contained in an image but also on their mutual relations. The proposed system uses general images and captions available on the web as training data to generate captions for new images. Furthermore, because the learning cost is independent from the amount of data, the system has scalability, which makes it useful with large-scale data.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115701166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Essid, Yves Grenier, M. Maazaoui, G. Richard, R. Tournemenne
{"title":"An audio-driven virtual dance-teaching assistant","authors":"S. Essid, Yves Grenier, M. Maazaoui, G. Richard, R. Tournemenne","doi":"10.1145/2072298.2072416","DOIUrl":"https://doi.org/10.1145/2072298.2072416","url":null,"abstract":"This work addresses the Huawei/3Dlife Grand challenge proposing a set of audio tools for a virtual dance-teaching assistant. These tools are meant to help the dance student develop a sense of rhythm to correctly synchronize his/her movements and steps to the musical timing of the choreographies to be executed. They consist of three main components, namely a music (beat) analysis module, a source separation and remastering module and a dance step segmentation module. These components enable to create augmented tutorial videos highlighting the rhythmic information using, for instance, a synthetic dance teacher voice, but also videos highlighting the steps executed by a student to help in the evaluation of his/her performance.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125220928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images","authors":"Jonathon S. Hare, Sina Samangooei, D. Dupplaw","doi":"10.1145/2072298.2072421","DOIUrl":"https://doi.org/10.1145/2072298.2072421","url":null,"abstract":"OpenIMAJ and ImageTerrier are recently released open-source libraries and tools for experimentation and development of multimedia applications using Java-compatible programming languages. OpenIMAJ (the Open toolkit for Intelligent Multimedia Analysis in Java) is a collection of libraries for multimedia analysis. The image libraries contain methods for processing images and extracting state-of-the-art features, including SIFT. The video and audio libraries support both cross-platform capture and processing. The clustering and nearest-neighbour libraries contain efficient, multi-threaded implementations of clustering algorithms. The clustering library makes it possible to easily create BoVW representations for images and videos. OpenIMAJ also incorporates a number of tools to enable extremely-large-scale multimedia analysis using distributed computing with Apache Hadoop. ImageTerrier is a scalable, high-performance search engine platform for content-based image retrieval applications using features extracted with the OpenIMAJ library and tools. The ImageTerrier platform provides a comprehensive test-bed for experimenting with image retrieval techniques. The platform incorporates a state-of-the-art implementation of the single-pass indexing technique for constructing inverted indexes and is capable of producing highly compressed index data structures.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121089042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianming Liu, H. Yao, R. Ji, Pengfei Xu, Xiaoshuai Sun, Q. Tian
{"title":"Learning heterogeneous data for hierarchical web video classification","authors":"Xianming Liu, H. Yao, R. Ji, Pengfei Xu, Xiaoshuai Sun, Q. Tian","doi":"10.1145/2072298.2072355","DOIUrl":"https://doi.org/10.1145/2072298.2072355","url":null,"abstract":"Web videos such as YouTube are hard to obtain sufficient precisely labeled training data and analyze due to the complex ontology. To deal with these problems, we present a hierarchical web video classification framework by learning heterogeneous web data, and construct a bottom-up semantic forest of video concepts by learning from meta-data. The main contributions are two-folds: firstly, analysis about middle-level concepts' distribution is taken based on data collected from web communities, and a concepts redistribution assumption is made to build effective transfer learning algorithm. Furthermore, an AdaBoost-Like transfer learning algorithm is proposed to transfer the knowledge learned from Flickr images to YouTube video domain and thus it facilitates video classification. Secondly, a group of hierarchical taxonomies named Semantic Forest are mined from YouTube and Flickr tags which reflect better user intention on the semantic level. A bottom-up semantic integration is also constructed with the help of semantic forest, in order to analyze video content hierarchically in a novel perspective. A group of experiments are performed on the dataset collected from Flickr and YouTube. Compared with state-of-the-arts, the proposed framework is more robust and tolerant to web noise.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123746301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ztitch: a mobile phone application for 3D scene creation, navigation, and sharing","authors":"Andrew Au, Jie Liang","doi":"10.1145/2072298.2072460","DOIUrl":"https://doi.org/10.1145/2072298.2072460","url":null,"abstract":"Modern smartphones provide an excellent platform for creating 3D scenes from photos. While there already exists many mobile applications that can stitch a set of photos to create a single, panoramic landscape photo, this paper proposes the creation of panoramic scenes where multiple photos are projected in a 3D space using the pinhole camera model, so that a realistic perspective of the scene is maintained. Our application allows users to automatically create a panoramic 3D scene in real-time using the live images from the phone's camera, and to manually fine-tune each photo's position via the touchscreen if the default position is inaccurate. The application enables scenes to be easily shared to other users, and was developed using the Silverlight framework so that it can run across multiple platforms.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122483274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia Hao, Guanfeng Wang, Beomjoo Seo, Roger Zimmermann
{"title":"Keyframe presentation for browsing of user-generated videos on map interfaces","authors":"Jia Hao, Guanfeng Wang, Beomjoo Seo, Roger Zimmermann","doi":"10.1145/2072298.2071926","DOIUrl":"https://doi.org/10.1145/2072298.2071926","url":null,"abstract":"To present user-generated videos that relate to geographic areas for easy access and browsing it is often natural to use maps as interfaces. A common approach is to place thumbnail images of video keyframes in appropriate locations. Here we consider the challenge of determining which keyframes to select and where to place them on the map. Our proposed technique leverages sensor-collected meta-data which are automatically acquired as a continuous stream together with the video. Our approach is able to detect interesting regions and objects (hotspots) and their distances from the camera in a fully automated way. Meaningful keyframes are adaptively selected based on the popularity of the hotspots. Our experiments show very promising results and demonstrate excellent utility for the users.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122611151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ImagiLight: a vision approach to lighting scene setting","authors":"T. Gritti, G. Monaci","doi":"10.1145/2072298.2071995","DOIUrl":"https://doi.org/10.1145/2072298.2071995","url":null,"abstract":"The advent of integrated lighting installations, consisting of individually controllable lamps with advanced rendering capabilities, is fundamentally transforming lighting. This brings a need for an intuitive control capable of fully exploiting the rendering capabilities of the complete lighting infrastructure. In this paper we present a new method to automatically create lighting atmospheres in any type of environment, that allows for an natural interaction with the lighting system and generates unique, suggestive effects. To prove the effectiveness and versatility of the proposed solution, we deploy the system in several application scenarios, and discuss the results.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"449 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122825880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Venkatesh, S. Greenhill, Dinh Q. Phung, Brett Adams
{"title":"Cognitive intervention in autism using multimedia stimulus","authors":"S. Venkatesh, S. Greenhill, Dinh Q. Phung, Brett Adams","doi":"10.1145/2072298.2072448","DOIUrl":"https://doi.org/10.1145/2072298.2072448","url":null,"abstract":"We demonstrate an open multimedia-based system for delivering early intervention therapy for autism. Using flexible multi-touch interfaces together with principled ways to access rich content and tasks, we show how a syllabus can be translated into stimulus sets for early intervention. Media stimuli are able to be presented agnostic to language and media modality due to a semantic network of concepts and relations that are fundamental to language and cognitive development, which enable stimulus complexity to be adjusted to child performance. Being open, the system is able to assemble enough media stimuli to avoid children over-learning, and is able to be customised to a specific child which aids with engagement. Computer-based delivery enables automation of session logging and reporting, a fundamental and time-consuming part of therapy.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114246815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expanding the point: automatic enlargement of presentation video elements","authors":"Q. Tung, R. Swaminathan, A. Efrat, Kobus Barnard","doi":"10.1145/2072298.2071913","DOIUrl":"https://doi.org/10.1145/2072298.2071913","url":null,"abstract":"We present a system that assists users in viewing videos of lectures on small screen devices, such as cell phones. It automatically identifies semantic units on the slides, such as bullets, groups of bullets, and images. As the participant views the lecture, the system magnifies the appropriate semantic unit while it is the focus of the discussion. The system makes this decision based on cues from laser pointer gestures and spoken words that are read off the slide. It then magnifies the semantic element using the slide image and the homography between the slide image and the video frame. Experiments suggest that the semantic units of laser-based events identified by our algorithm closely match those identified by humans. In the case of identifying bullets through spoken words, results are more limited but are a good starting point for more complex methods. Finally, we show that this kind of magnification has potential for improving learning of technical content from video lectures when the resolution of the video is limited, such as when being viewed on hand held devices.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114594219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}