Christophe Lino, M. Christie, R. Ranon, William H. Bares
{"title":"A smart assistant for shooting virtual cinematography with motion-tracked cameras","authors":"Christophe Lino, M. Christie, R. Ranon, William H. Bares","doi":"10.1145/2072298.2072481","DOIUrl":"https://doi.org/10.1145/2072298.2072481","url":null,"abstract":"This demonstration shows how an automated assistant encoded with knowledge of cinematography practice can offer suggested viewpoints to a filmmaker operating a hand-held motion-tracked virtual camera device. Our system, called Director's Lens, uses an intelligent cinematography engine to compute, at the request of the filmmaker, a set of suitable camera placements for starting a shot that represent semantically and cinematically distinct choices for visualizing the current narrative. Editing decisions and hand-held camera compositions made by the user in turn influence the system's suggestions for subsequent shots. The result is a novel virtual cinematography workflow that enhances the filmmaker's creative potential by enabling efficient exploration of a wide range of computer-suggested cinematographic possibilities.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121894807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymmetric hamming embedding: taking the best of our bits for large scale image search","authors":"Mihir Jain, H. Jégou, P. Gros","doi":"10.1145/2072298.2072035","DOIUrl":"https://doi.org/10.1145/2072298.2072035","url":null,"abstract":"This paper proposes an asymmetric Hamming Embedding scheme for large scale image search based on local descriptors. The comparison of two descriptors relies on an vector-to-binary code comparison, which limits the quantization error associated with the query compared with the original Hamming Embedding method. The approach is used in combination with an inverted file structure that offers high efficiency, comparable to that of a regular bag-of-features retrieval system. The comparison is performed on two popular datasets. Our method consistently improves the search quality over the symmetric version. The trade-off between memory usage and precision is evaluated, showing that the method is especially useful for short binary signatures.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122047881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Socially relevant simulation games: a design study","authors":"Ramin Tadayon, Ashish Amresh, W. Burleson","doi":"10.1145/2072298.2071908","DOIUrl":"https://doi.org/10.1145/2072298.2071908","url":null,"abstract":"Socially Relevant Simulation Games (SRSG), a new medium for social interaction, based on real-world skills and skill development, creates a single gaming framework that connects both serious and casual players. Through a detailed case study this paper presents a design process and framework for SRSG, in the context of mixed-reality golf swing simulations. The SRSG, entitled \"World of Golf\", utilizes a real-time expert system to capture, analyze, and evaluate golf swing metrics. The game combines swing data with players' backgrounds, e.g., handicaps, to form individual profiles. These profiles are then used to implement a golf simulation game using artificially controlled agents who inherit the skill levels of their corresponding human users. The simulation and assessment modules provide the serious player with tools to build golf skills while allowing casual players to engage within a simulated social world. A framework that incorporates simulated golf competitions among these social agents is presented and validated by comparing the usage statistics of 10 PGA Golf Management (PGM) students with 10 non-professional students.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129865786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bag-of-colors for improved image search","authors":"C. Wengert, Matthijs Douze, H. Jégou","doi":"10.1145/2072298.2072034","DOIUrl":"https://doi.org/10.1145/2072298.2072034","url":null,"abstract":"This paper investigates the use of color information when used within a state-of-the-art large scale image search system. We introduce a simple yet effective and efficient color signature generation procedure. It is used either to produce global or local descriptors. As a global descriptor, it outperforms several state-of-the-art color description methods, in particular the bag-of-words method based on color SIFT. As a local descriptor, our signature is used jointly with SIFT descriptors (no color) to provide complementary information. This significantly improves the recognition rate, outperforming the state of the art on two image search benchmarks. We provide an open source package of our signature (http://www.kooaba.com/en/learnmore/labs/).","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast detection of noisy GPS and magnetometer tags in wide-baseline multi-views","authors":"Aveek Shankar Brahmachari, Sudeep Sarkar","doi":"10.1145/2072298.2071922","DOIUrl":"https://doi.org/10.1145/2072298.2071922","url":null,"abstract":"We propose an algorithm for detection of noisy GPS and magnetometer tags in wide-baseline camera views. Our algorithm neither needs densely sampled views nor does it need a single visually connected path through all the views in the dataset. We use vision-based estimates of mutual rotation and translation between cameras to compute a measure of confidence on the correctness of the associated GPS and magnetometer tags. The vision algorithm can find the epipolar geometry between two wide-baseline images without needing pre-specified correspondences. We have two versions of our approach; one that requires geometric pose estimation between all pairs of images and a faster version that uses a pre-filter based on photometric comparison of images to quickly reject non-overlapping views from further geometric consideration. We show qualitative and quantitative results on the Nokia Grand Challenge 2010 Dataset. We find that magnetometer readings are more accurate than GPS readings.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130543965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Heng Lei, Yan-Ying Chen, Lime Iida, Bor-Chun Chen, Hsiao-Hang Su, Winston H. Hsu
{"title":"Photo search by face positions and facial attributes on touch devices","authors":"Yu-Heng Lei, Yan-Ying Chen, Lime Iida, Bor-Chun Chen, Hsiao-Hang Su, Winston H. Hsu","doi":"10.1145/2072298.2072410","DOIUrl":"https://doi.org/10.1145/2072298.2072410","url":null,"abstract":"With the explosive growth of camera devices, people can freely take photos to capture moments of life, especially the ones accompanied with friends and family. Therefore, a better solution to organize the increasing number of personal or group photos is highly required. In this paper, we propose a novel way to search for face images according facial attributes and face similarity of the target persons. To better match the face layout in mind, our system allows the user to graphically specify the face positions and sizes on a query \"canvas,\" where each attribute or identity is defined as an \"icon\" for easier representation. Moreover, we provide aesthetics filtering to enhance visual experience by removing candidates of poor photographic qualities. The scenario has been realized on a touch device with an intuitive user interface. With the proposed block-based indexing approach, we can achieve near real-time retrieval (0.1 second on average) in a large-scale dataset (more than 200k faces in Flickr images).","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130611672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial on multimedia music signal processing","authors":"G. Richard","doi":"10.1145/2072298.2072402","DOIUrl":"https://doi.org/10.1145/2072298.2072402","url":null,"abstract":"The enormous amount of unstructured audio data available nowadays and the spread of its use as a data source in many applications are introducing new challenges to researchers in information and multimedia signal processing. Automatic analysis of audio documents (music, radio broadcast audio streams,...) gathers several research directions including audio indexing and transcription (extraction of informative features leading to audio content recognition or to the estimation of high level concepts such as melody, rhythm, instrumentation or harmony,...), audio classification (grouping by similarity, by music genre or by audio events categories) and content-based retrieval (such as query by example or query by humming approaches). In this context, the general field of Music signal Processing is receiving a growing interest and becomes more relevant and more visible in the audio community. Nevertheless, if much work is tackled in audio and music signal processing it is somewhat often presented only in specialized music or audio signal processing conferences. In the multimedia community, the focus of interest is often on the image or video signal with less emphasis on the audio signal and its potential for analyzing or interpreting a multimedia scene. The aim of the proposed tutorial is then to provide a general introduction of audio signal processing which should be of broad interest for the multimedia community, to review the state of the art in music signal processing (this will be largely based on [1]) and to highlight with some examples the potential of music signal processing for multimedia streams.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast frame-rate up-conversion of depth video via video coding","authors":"Yanjie Li, Lifeng Sun, Tianfan Xue","doi":"10.1145/2072298.2072003","DOIUrl":"https://doi.org/10.1145/2072298.2072003","url":null,"abstract":"Recent development of depth sensors has facilitated the progress of 2D-plus-depth methods for 3D video representation, for which frame-rate up-conversion (FRUC) of depth video is a critical step. However, due to the computational cost of state-of-the-art FRUC methods, real time applications of 2D-plus-depth is still limited. In this paper, we present a method of speeding up the FRUC of the depth video by treating it as part of a video coding process, combined with a novel color-mapping algorithm is adopted to improve the quality of temporal upsampling. Experiments show that the proposed systems saves up to 99.5% of the frame interpolation time, while achieving virtually identical reconstructed depth video as state-of-the-art methods.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126342399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial pooling for transformation invariant image representation","authors":"Xia Li, Yan Song, Yijuan Lu, Q. Tian","doi":"10.1145/2072298.2072052","DOIUrl":"https://doi.org/10.1145/2072298.2072052","url":null,"abstract":"Spatial Pyramid Matching (SPM) [2] has been proposed to extend the Bag-of-Word (BoW) model for object classification. By re-serving the finer level information, it makes image matching more accurate. However, for not well-aligned images, where the object is rotated, flipped or translated, SPM may lose its discrimination power. To tackle this problem, we propose novel spatial pooling layouts to address various transformations, and generate a more general image representation. To evaluate the effectiveness of the proposed approach, we conduct extensive experiments on three transformation emphasized datasets for object classification task. Experimental results demonstrate its superiority over the state-of-the-arts. Besides, the proposed image representation is compact and consistent with the BoW model, which makes it applicable to image retrieval task as well.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125794796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanmin Wu, A. Arefin, G. Kurillo, Pooja Agarwal, K. Nahrstedt, R. Bajcsy
{"title":"Color-plus-depth level-of-detail in 3D tele-immersive video: a psychophysical approach","authors":"Wanmin Wu, A. Arefin, G. Kurillo, Pooja Agarwal, K. Nahrstedt, R. Bajcsy","doi":"10.1145/2072298.2072302","DOIUrl":"https://doi.org/10.1145/2072298.2072302","url":null,"abstract":"This paper presents a psychophysical study that measures the perceptual thresholds of a new factor called Color-plus-Depth Level-of-Detail peculiar to polygon-based 3D tele-immersive video. The results demonstrate the existence of Just Noticeable Degradation and Just Unacceptable Degradation thresholds on the factor. In light of the results, we describe the design and implementation of a real-time perception-based quality adaptor for 3D tele-immersive video. Our experimental results show that the adaptation scheme can reduce resource usage while considerably enhancing the overall perceived visual quality.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}