{"title":"Who are the users of a video search system? Classifying a heterogeneous group with a profile matrix","authors":"Max Kemman, M. Kleppe, Henri Beunders","doi":"10.1109/WIAMIS.2012.6226765","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226765","url":null,"abstract":"Formulating requirements for a video search system can be a challenging task when everyone is a possible user. This paper explores the possibilities of classifying users by creating a Profile Matrix, placing users on two axes: experience and goal-directedness. This enables us to describe the characteristics of the subgroups and investigate differences between the different groups. We created Profile Matrices by classifying 850 respondents of a survey regarding a requirements study for a video search system. We conclude that the Profile Matrix indeed enables us to classify subgroups of users and describe their characteristics. The current research is limited to descriptions of subgroups and analysis of differences between these subgroups. In the future, we want to research what these differences mean with regard to the users' performance and acceptance of a video search system and explore the use of a profile matrix for other types of search systems.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social event discovery by topic inference","authors":"Xueliang Liu, B. Huet","doi":"10.1109/WIAMIS.2012.6226752","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226752","url":null,"abstract":"With the keen interest of people for social media sharing websites the multimedia research community faces new challenges and compelling opportunities. In this paper, we address the problem of discovering specific events from social media data automatically. Our proposed approach assumes that events are conjoint distribution over the latent topics in a given place. Based on this assumption, topics are learned from large amounts of automatically collected social data using a LDA model. Then, event distribution estimation over a topic is solved using least mean square optimization. We evaluate our methods on locations scattered around the world and show via our experimental results that the proposed framework offers promising performance for detecting events based on social media.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130880808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Schneider, Sebastian Tschöpel, J. Schwenninger
{"title":"Social recommendation using speech recognition: Sharing TV scenes in social networks","authors":"Daniel Schneider, Sebastian Tschöpel, J. Schwenninger","doi":"10.1109/WIAMIS.2012.6226755","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226755","url":null,"abstract":"We describe a novel system which simplifies recommendation of video scenes in social networks, thereby attracting a new audience for existing video portals. Users can select interesting quotes from a speech recognition transcript, and share the corresponding video scene with their social circle with minimal effort. The system has been designed in close cooperation with the largest German public broadcaster (ARD), and was deployed at the broadcaster's public video portal. A twofold adaptation strategy adapts our speech recognition system to the given use case. First, a database of speaker-adapted acoustic models for the most important speakers in the corpus is created. We use spectral speaker identification for detecting whether one of these speakers is speaking, and select the corresponding model accordingly. Second, we apply language model adaptation by exploiting prior knowledge about the video category.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125540778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Codebook-free exemplar models for object detection","authors":"Jan Hendrik Becker, T. Tuytelaars, L. Gool","doi":"10.1109/WIAMIS.2012.6226768","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226768","url":null,"abstract":"Traditional bag-of-features approaches often vector-quantise the features into a visual codebook. This process inevitably causes loss of information. Recently codebook-free methods that avoid the vector-quantisation step have become more popular. Used in conjunction with nearest-neighbour approaches these methods have shown remarkable classification performance. In this paper we show how to exploit the concept of nearest neighbour based classification for object detection. Our codebook-free exemplar model combines the classification power of nearest neighbour methods with a detection concept based on exemplar models. We demonstrate the performance of our proposed system on a real-world dataset of images of motorbikes.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123368671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-the-fly specific person retrieval","authors":"Omkar M. Parkhi, A. Vedaldi, Andrew Zisserman","doi":"10.1109/WIAMIS.2012.6226775","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226775","url":null,"abstract":"We describe a method of visual search for finding people in large video datasets. The novelty is that the person of interest can be specified at run time by a text query, and a discriminative classifier for that person is then learnt on-the-fly using images downloaded from Google Image search. The performance of the method is evaluated on a ground truth dataset of episodes of Scrubs, and results are also shown for retrieval on the TRECVid 2011 IACC.1.B dataset of over 8k videos. The entire process from specifying the query to receiving the ranked results takes only a matter of seconds.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131753266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving identification by pruning: A case study on face recognition and body soft biometric","authors":"Carmelo Velardo, J. Dugelay","doi":"10.1109/WIAMIS.2012.6226747","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226747","url":null,"abstract":"We investigate body soft biometrics capabilities to perform pruning of a hard biometrics database improving both retrieval speed and accuracy. Our pre-classification step based on anthropometric measures is elaborated on a large scale medical dataset to guarantee statistical meaning of the results, and tested in conjunction with a face recognition algorithm. Our assumptions are verified by testing our system on a chimera dataset. We clearly identify the trade off among pruning, accuracy, and mensuration error of an anthropomeasure based system. Even in the worst case of ±10% biased anthropometric measures, our approach improves the recognition accuracy guaranteeing that only half database has to be considered.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131062049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing numbers from the low-resolution patterns in digital images","authors":"E. Tsomko, Hyoung-Joong Kim","doi":"10.1109/WIAMIS.2012.6226777","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226777","url":null,"abstract":"In this paper, we suggest a new method for solving a problem of recognition numbers from low-resolution (LR) patterns in digital images. Having the digital photographs, for example, with the cars, we can recognize the numbers in the license plates of the cars, even if the plate patterns are still unclear after applying the Super-Resolution. The proposed method is based on statistical analysis and learning the features of the single LR digit pattern with the features of high-resolution (HR) one.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129502554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Alignment of 2D objects for shape interpretation","authors":"O. Mzoughi, Itheri Yahiaoui, N. Boujemaa","doi":"10.1109/WIAMIS.2012.6226769","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226769","url":null,"abstract":"Humans usually describe objects along a certain direction, called intuitive direction, in other words place them in a way that they are commonly seen in their surroundings. In computer vision, the intuitive alignment may be very useful for object interpretation and semantic classification. For example, it may facilitate the extraction of characteristic points such as the base and apex of plant leaves, the eyes and tail of fishes and so on, which enrich greatly the representation of their shapes. While it still an open challenge to automatically align objects along their intuitive orientation, this paper examines opportunities to determine it for objects that have a somewhat symmetric shapes. Inspired by an idea related to 3D alignment, our approach is based on two types of symmetry: reflectional symmetry and local translational symmetry. Experimental results carried on the MPEG-7 dataset show that our method detects an alignment that corresponds to the users' intuition for most objects that have an approximately or partially symmetric shapes.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115668655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A variational statistical framework for clustering human action videos","authors":"Wentao Fan, N. Bouguila","doi":"10.1109/WIAMIS.2012.6226748","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226748","url":null,"abstract":"In this paper, we present an unsupervised learning method, based on the finite Dirichlet mixture model and the bag-of-visual words representation, for categorizing human action videos. The proposed Bayesian model is learned through a principled variational framework. A variational form of the Deviance Information Criterion (DIC) is incorporated within the proposed statistical framework for evaluating the correctness of the model complexity (i.e. number of mixture components). The effectiveness of the proposed model is illustrated through empirical results.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123892297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shu Chen, Kevin McGuinness, Robin Aly, N. O’Connor, F. D. Jong
{"title":"The AXES-lite video search engine","authors":"Shu Chen, Kevin McGuinness, Robin Aly, N. O’Connor, F. D. Jong","doi":"10.1109/WIAMIS.2012.6226778","DOIUrl":"https://doi.org/10.1109/WIAMIS.2012.6226778","url":null,"abstract":"The aim of AXES is to develop tools that provide various types of users with new engaging ways to interact with audiovisual libraries, helping them discover, browse, navigate, search, and enrich archives. This paper describes the initial (lite) version of the AXES search engine, which is targeted at professional users such as media professionals and archivists. We describe the overall system design, the user interface, and the results of our experiments at TRECVid 2011.","PeriodicalId":346777,"journal":{"name":"2012 13th International Workshop on Image Analysis for Multimedia Interactive Services","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116020865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}