{"title":"Interactive segmentation and tracking of video objects","authors":"Xavier Giró-i-Nieto, Manel Martos","doi":"10.1109/WIAMIS.2012.6226749","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226749","url":null,"abstract":"This paper describes a mechanism to interactively segment objects from a sequence of video frames. The extracted object can be later embedded in a different background, associated to local scale metadata or used to train an automatic object detector. The workflow requires the interaction of the user at two stages: the temporal segmentation of the frames containing the object and the generation of an object mask to initialize a video tracker. The mask is defined as a combination of regions generated by an image segmentation algorithm. This framework has been integrated in an annotation tool available to the public.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using a 3D cylindrical interface for image browsing to improve visual search performance","authors":"Klaus Schöffmann, David Ahlström","doi":"10.1109/WIAMIS.2012.6226759","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226759","url":null,"abstract":"In this paper we evaluate a 3D cylindrical interface that arranges image thumbnails by visual similarity for the purpose of image browsing. Through a user study we compare the performance of this interface to the performance of a common scrollable 2D list of thumbnails in a grid arrangement. Our evaluation shows that the 3D Cylinder interface enables significantly faster visual search and is the preferred search interface for the majority of tested users.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130949636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual saliency estimation for video","authors":"Matthew Oakes, G. Abhayaratne","doi":"10.1109/WIAMIS.2012.6226751","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226751","url":null,"abstract":"The most eye catching regions within an image or video can be captured by exploiting characteristics within the human visual system. In this paper we propose a novel method for modeling the visual saliency information in a video sequence. The proposed method incorporates wavelet decomposition and the modeling of the human visual system to capture spatiotemporal saliency information. A unique approach to capture and combine salient motion data with spatial intensity and orientation contrasts in the sequence, is presented. The proposed method shows a superior performance compared to the state-of-the-art existing methods. The fast algorithm can be simply implemented and is useful for many wavelet based applications such as watermarking, compression and fusion.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualisation of tennis swings for coaching","authors":"Philip Kelly, N. O’Connor","doi":"10.1109/WIAMIS.2012.6226750","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226750","url":null,"abstract":"As a proficient tennis swing is a key element of success in tennis, many amateur tennis players spend a considerable amount of time and effort perfecting their tennis stroke mechanics, hoping to create more accuracy, consistency and power in their swing. In order to achieve these aims effectively a number of independent aspects of technique need to be addressed, including forming a correct racket grip, shot timing, body orientation and precise follow-through. Outside of a one-to-one coaching scenario, where constant professional feedback on technique can be provided, keeping all aspects of technique in mind can overwhelm amateur players. In this work, we have developed a set of visualisation tools to augment the development of amateur tennis players between dedicated one-to-one coaching sessions in the area of technique, timing and body posture. Our approach temporally aligns an amateur player's swing dynamics with that of an elite athlete, allowing direct technique comparison using augmented reality techniques.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131460066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why did you record this video? An exploratory study on user intentions for video production","authors":"M. Lux, Jochen Huber","doi":"10.1109/WIAMIS.2012.6226758","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226758","url":null,"abstract":"Why do people record videos and share them? While the question seems to be simple, user intentions have not yet been investigated for video production and sharing. A general taxonomy would lead to adapted information systems and multimedia interfaces tailored to the users' intentions. We contribute (1) an exploratory user study with 20 participants, examining the various facets of user intentions for video production and sharing in detail and (2) a novel set of user intention clusters for video production, grounded empirically in our study results. We further reflect existing work in specialized domains (i.e. video blogging and mobile phone cameras) and show that prevailing models used in other multimedia fields (e.g. photography) cannot be used as-is to reason about video recording and sharing intentions.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131749493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereo video completion for rig and artefact removal","authors":"F. Raimbault, François Pitié, A. Kokaram","doi":"10.1109/WIAMIS.2012.6226762","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226762","url":null,"abstract":"Video reconstruction has become an important tool for rig and artefact removal in cinema postproduction. In this paper we are concerned with reconstructing stereo video material. We propose a method that builds on existing exemplar-based video inpainting techniques and includes a dedicated view consistency constraint. Within a constrained texture synthesis framework, we use reconstructed motion and inter-frame disparity vectors as guides for finding appropriate example source patches from parts of the sequence that minimise spatial and stereo discrepancies. We then introduce coherent patch sewing to reconstruct the missing region by stitching the source patches together. Compared to previous methods our results show increased spatial and view consistency.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126829559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved B-slices DIRECT mode coding using motion side information","authors":"Xiem HoangVan, J. Ascenso, F. Pereira","doi":"10.1109/WIAMIS.2012.6226757","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226757","url":null,"abstract":"The so-called DIRECT coding mode plays an important role in the RD performance of predictive video coding such as the H.264/AVC and MPEG-4 standards because there is typically a large probability that the DIRECT mode is selected in B-slices by the rate-distortion optimization (RDO) process. Although the current H.264/AVC DIRECT coding procedure exploits the motion vectors (MV) obtained from the reference frames in a rather effective way, it may still be improved by considering better motion information such as motion data derived by the side information (SI) creation process typical of distributed video coding. Therefore, this paper proposes an improved DIRECT coding mode for B-slices by efficiently exploiting some motion side information available at both the encoder and decoder. Experimental results show that the proposed improved DIRECT coding mode provides up to 8% bitrate saving or 0.46 dB PSNR improvement.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131307542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xavier Sevillano, T. Piatrik, K. Chandramouli, Qianni Zhang, E. Izquierdo
{"title":"Geo-tagging online videos using semantic expansion and visual analysis","authors":"Xavier Sevillano, T. Piatrik, K. Chandramouli, Qianni Zhang, E. Izquierdo","doi":"10.1109/WIAMIS.2012.6226764","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226764","url":null,"abstract":"The association of geographical tags to multimedia resources enables browsing and searching online multimedia repositories using geographical criteria, but millions of already online but non geo-tagged videos and images remain invisible to the eyes of this type of systems. This situation calls for the development of automatic geo-tagging techniques capable of estimating the location where a video or image was taken. This paper presents a bimodal geo-tagging system for online videos based on extracting and expanding the geographical information contained in the textual metadata and on visual similarity criteria. The performance of the proposed system is evaluated on the MediaEval 2011 Placing task data set, and compared against the participants in that workshop.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121049074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An intelligent depth-based obstacle detection system for visually-impaired aid applications","authors":"Chia-Hsiang Lee, Yu-Chi Su, Liang-Gee Chen","doi":"10.1109/WIAMIS.2012.6226753","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226753","url":null,"abstract":"In this paper, we present a robust depth-based obstacle detection system in computer vision. The system aims to assist the visually-impaired in detecting obstacles with distance information for safety. With analysis of the depth map, segmentation and noise elimination are adopted to distinguish different objects according to the related depth information. Obstacle extraction mechanism is proposed to capture obstacles by various object proprieties revealing in the depth map. The proposed system can also be applied to emerging vision-based mobile applications, such as robots, intelligent vehicle navigation, and dynamic surveillance systems. Experimental results demonstrate the proposed system achieves high accuracy. In the indoor environment, the average detection rate is above 96.1%. Even in the outdoor environment or in complete darkness, 93.7% detection rate is achieved on average.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128444515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-based analysis for accessing audiovisual archives: Alternatives for concept-based indexing and search","authors":"T. Tuytelaars","doi":"10.1109/WIAMIS.2012.6226770","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226770","url":null,"abstract":"Huge amounts of audiovisual material have been digitized recently, resulting in a great source of information relevant both from a cultural and historical point of view. However, in spite of millions of man hours spent on manual annotation and recent advances in (semi-)automatic metadata generation, accessing these archives and retrieving relevant information from them remains a difficult task. Up to recently, the main paradigm to open up archives by automatic tools for audiovisual analysis has been a concept-based indexing and retrieval oriented approach. However, this approach has its limitations, in that it does not scale well, it requires strong supervision, and does not really match well to the user's needs. In this paper, we discuss some upcoming alternative approaches that try to overcome or circumvent some of these issues. This includes i) the use of knowledge modeling to bridge the semantic gap; ii) on-the-fly learning of new, user-defined concepts; and iii) weakly supervised methods that learn from associated text data. We also discuss what we consider important open issues at this time that deserve more attention from the research community.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114569855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}