{"title":"How different kinds of sound in videos can influence gaze","authors":"Guanghan Song, D. Pellerin, L. Granjon","doi":"10.1109/WIAMIS.2012.6226776","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226776","url":null,"abstract":"This paper presents an analysis of the effect of thirteen different kinds of sound on visual gaze when looking freely at videos to help to predict eye positions. First, an audio-visual experiment was designed with two groups of participants, with audio-visual (AV) and visual (V) conditions, to test the sound effect. Then, an audio experiment was designed to validate the classification of sound we proposed. We observed that the sound effect is different depending on the kind of sound, and that the classes with human voice (speech, singer, human noise and singers) have the greatest effect. Finally, a comparison of eye positions with a visual saliency model was carried out, which proves that adding sound to video decreases the accuracy of prediction of the visual saliency model.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123750230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A skeleton based binarization approach for video text recognition","authors":"Haojin Yang, B. Quehl, Harald Sack","doi":"10.1109/WIAMIS.2012.6226754","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226754","url":null,"abstract":"Text in video data comes in different resolutions and with heterogeneous background resulting in difficult contrast ratios that most times prohibit valid OCR (Optical Character Recognition) results. Therefore, the text has to be separated from its background before applying standard OCR process. This pre-processing task can be achieved by a suitable image binarization procedure. In this paper, we propose a novel binarization method for video text images with complex background. The proposed method is based on a seed-region growing strategy. First, the text gradient direction is approximated by analyzing the content distribution of image skeleton maps. Then, the text seed-pixels are selected by calculating the average grayscale value of skeleton pixels. And finally, an automated seed region growing algorithm is applied to obtain the text pixels. The accuracy of the proposed approach is shown by evaluation.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133096349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A machine learning approach to determining tag relevance in geotagged Flickr imagery","authors":"Mark Hughes, N. O’Connor, G. Jones","doi":"10.1109/WIAMIS.2012.6226774","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226774","url":null,"abstract":"We present a novel machine learning based approach to determining the semantic relevance of community contributed image annotations for the purposes of image retrieval. Current large scale community image retrieval systems typically rely on human annotated tags which are subjectively assigned and may not provide useful or semantically meaningful labels to the images. Homogeneous tags which fail to distinguish between are a common occurrence, which can lead to poor search effectiveness on this data. We described a method to improve text based image retrieval systems by eliminating generic or non relevant image tags. To classify tag relevance, we propose a novel feature set based on statistical information available for each tag within a collection of geotagged images harvested from Flickr. Using this feature set machine learning models are trained to classify the relevance of each tag to its associated image. The goal of this process is to allow for rich and accurate captioning of these images, with the objective of improving the accuracy of text based image retrieval systems. A thorough evaluation is carried out using a human annotated benchmark collection of Flickr tags.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"33 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134333575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}