{"title":"Learning people co-occurrence relations by using relevance feedback for retrieving group photos","authors":"K. Shimizu, Naoko Nitta, N. Babaguchi","doi":"10.1145/1991996.1992053","DOIUrl":"https://doi.org/10.1145/1991996.1992053","url":null,"abstract":"This paper proposes an image retrieval method which retrieves images of a specific person from group photos. Many query-by-example methods have focused only on the visual features of the queried person. However, since socially related people such as family and friends are often taken photos together, their co-occurrence relations can be useful information. Thus, we propose an image retrieval method which uses the visual features of not only the queried person but also those who co-occur with the queried person in the same images. Relevance feedback is used to learn who co-occur with the queried person, their faces, and how strong their co-occurrence relations are. When retrieving the images of 19 persons in total from 158 images, after five feedback iterations, the recall rate of 50% was obtained by considering the people co-occurrence relations, as against 33% when considering only the queried person. With human errors in giving relevance feedback, the recall rate still improved to 40%.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115459341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc von Wyl, Hisham Mohamed, E. Bruno, S. Marchand-Maillet
{"title":"A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback","authors":"Marc von Wyl, Hisham Mohamed, E. Bruno, S. Marchand-Maillet","doi":"10.1145/1991996.1992069","DOIUrl":"https://doi.org/10.1145/1991996.1992069","url":null,"abstract":"Indexing web-scale multimedia is only possible by distributing storage and computing efforts. Existing large-scale content-based indexing services mostly do not offer interactive relevance feedback. Here, we propose a running demonstrator of our Cross-Modal Search Engine (CMSE) implementing a query-by-example search strategy with relevance feedback and distributed over a cluster of 20 Dual core machines using MPI. We present the performance gain in terms of interactivity (search time) using a part of the Image-Net collection containing more than one million images as base example.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social media driven image retrieval","authors":"Adrian Daniel Popescu, G. Grefenstette","doi":"10.1145/1991996.1992029","DOIUrl":"https://doi.org/10.1145/1991996.1992029","url":null,"abstract":"People often try to find an image using a short query and images are usually indexed using short annotations. Matching the query vocabulary with the indexing vocabulary is a difficult problem when little text is available. Textual user generated content in Web 2.0 platforms contains a wealth of data that can help solve this problem. Here we describe how to use Wikipedia and Flickr content to improve this match. The initial query is launched in Flickr and we create a query model based on co-occurring terms. We also calculate nearby concepts using Wikipedia and use these to expand the query. The final results are obtained by ranking the results for the expanded query using the similarity between their annotation and the Flickr model. Evaluation of these expansion and ranking techniques, over the Image CLEF 2010 Wikipedia Collection containing 237,434 images and their multilingual textual annotations, shows that a consistent improvement compared to state of the art methods.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125891884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Uijlings, O. D. Rooij, Daan Odijk, A. Smeulders, M. Worring
{"title":"Instant Bag-of-Words served on a laptop","authors":"J. Uijlings, O. D. Rooij, Daan Odijk, A. Smeulders, M. Worring","doi":"10.1145/1991996.1992065","DOIUrl":"https://doi.org/10.1145/1991996.1992065","url":null,"abstract":"This demo showcases our realtime implementation of concept classification using the Bag-of-Words method embedded within MediaTable, our interactive categorization tool for large multimedia collections. MediaTable allows the users to open images from disk or download these directly from the internet. Each image is then processed using the Bag-of-Words method, which computes classification scores for 20 distinct concepts classes on the fly. These are then seamlessly displayed in the interface.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115072402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harmonic style-based song retrieval using N-gram","authors":"C. Chuan","doi":"10.1145/1991996.1992013","DOIUrl":"https://doi.org/10.1145/1991996.1992013","url":null,"abstract":"N-gram models have been successfully applied to harmonic analysis for differentiating a composer's style based on all the pieces in a large corpus of the composer. In this paper, we focus on each individual song and explore the effectiveness of the n-gram model when it is applied to a different but equally important musical task: harmonic style-based song retrieval. A chord profile is generated for a song by using the n-gram model as the descriptor of the song's harmonic features. The system retrieves songs based on the similarity between the chord profile in the query and that of the songs in the database. The retrieval result is evaluated in terms of a style-based retrieval score as well as traditional information retrieval metrics such as precision and recall. Finally, we list the most common pop-rock chord patterns from the most frequently retrieved songs, and compare the patterns with those described in previous works.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128430563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongliang Bai, Lezi Wang, Gang Qin, Jiwei Zhang, Kun Tao, Xiaofu Chang, Yuan Dong
{"title":"TV program segmentation using multi-modal information fusion","authors":"Hongliang Bai, Lezi Wang, Gang Qin, Jiwei Zhang, Kun Tao, Xiaofu Chang, Yuan Dong","doi":"10.1145/1991996.1992007","DOIUrl":"https://doi.org/10.1145/1991996.1992007","url":null,"abstract":"A TV program segmentation algorithm is presented by the fusion of the multi-modal information in the large-scale videos. As \"Inter-Programs\" are generally inserted into the TV videos repeatedly, the macro structures of the videos can be effectively and automatically generated by identifying the video-audio features of the special sequences. The Electronic Program Guide (EPG) is used to organize the structures into the programs. Three sections are included in the algorithm, namely, the video-based non-supervised duplicate sequence detection, the audio-based special clip retrieval and the EPG-based 24-hour program segmentation. The algorithm has been tested in 60-day different-type TV videos. The F-measures of the multi-modal fusion and video-based duplicated sequence detection achieve the rates of over 98% and 96% respectively. These results show that the proposed method is highly efficient and effective for the TV Program segmentation.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131111834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based methods for the automatic annotation and retrieval of art prints","authors":"G. Carneiro","doi":"10.1145/1991996.1992028","DOIUrl":"https://doi.org/10.1145/1991996.1992028","url":null,"abstract":"The analysis of images taken from cultural heritage artifacts is an emerging area of research in the field of information retrieval. Current methodologies are focused on the analysis of digital images of paintings for the tasks of forgery detection and style recognition. In this paper, we introduce a graph-based method for the automatic annotation and retrieval of digital images of art prints. Such method can help art historians analyze printed art works using an annotated database of digital images of art prints. The main challenge lies in the fact that art prints generally have limited visual information. The results show that our approach produces better results in a weakly annotated database of art prints in terms of annotation and retrieval performance compared to state-of-the-art approaches based on bag of visual words.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face alignment via joint-AAM","authors":"T. Xiong, Yong Ma, Y. Zou","doi":"10.1145/1991996.1992018","DOIUrl":"https://doi.org/10.1145/1991996.1992018","url":null,"abstract":"In this paper, a joint active appearance model (joint-AAM) framework is proposed for face alignment. The object function consists of more than one active appearance model and some constraint items. It can be optimized through the efficient project-out inverse compositional (POIC) fitting algorithm. By transferring the low dimensional parameter space to the high one, the facial shape can be converged to the acceptable solution easier by joint-AAM comparing to single AAM, especially if the initial solutions locate on each side of the optimal solution. In multi-view case, different AAMs are jointed if the true view is far from the initial views. In single view case, different initial solutions of one AAM can be jointed to handle poor initialization or exaggerative expressions. Alternatively, 3D shape model is employed to impose stronger shape constraints on joint-AAM. A geometrical explanation is given to describe the reason of the robustness of the joint-AAM. The experiments demonstrate its accuracy, robustness and efficiency. The acronyms in this paper are listed in Tab. 1.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133710057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale music exploration in hierarchically organized landscapes using prototypicality information","authors":"M. Schedl, Christian Höglinger, Peter Knees","doi":"10.1145/1991996.1992004","DOIUrl":"https://doi.org/10.1145/1991996.1992004","url":null,"abstract":"We present a novel user interface that offers a fun way to explore music collections in virtual landscapes in a game-like manner. Extending previous work, special attention is paid to scalability and user interaction. In this vein, the ever growing size of today's music collections is addressed in two ways that allow for visualizing and browsing nearly arbitrarily sized music repositories. First, the proposed user interface deepTune employs a hierarchical version of the Self-Organizing Map (SOM) to cluster similar pieces of music using multiple, hierarchically aligned layers. Second, to facilitate orientation in the landscape by presenting well-known anchor points to the user, a combination of Web-based and audio signal-based information extraction techniques to determine cluster prototypes (songs) is proposed. Selecting representative and well-known prototypes -- the former is ensured by using signal-based features, the latter by using Web-based data -- is crucial for browsing large music collections. We further report on results of an evaluation carried out to assess the quality of the proposed cluster prototype ranking.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133624081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relevance feedback strategies for artistic image collections tagging","authors":"C. Grana, Daniele Borghesani, R. Cucchiara","doi":"10.1145/1991996.1992041","DOIUrl":"https://doi.org/10.1145/1991996.1992041","url":null,"abstract":"This paper provides an analysis on relevance feedback techniques in a multimedia system designed for the interactive exploration and annotation of artistic collections, in particular illuminated manuscripts. The relevance feedback is presented not only as a very effective technique to improve the performance of the system, but also as a clever way to increase the user experience, mixing the interactive surfing through the artistic content with the possibility to gather valuable information from the user, and consequently improving his retrieval satisfaction. We compare a modification of the Mean-Shift Feature Space Warping algorithm, as representative of the standard RF procedures, and a learning-based technique based on transduction, considered in order to overcome some limitation of the previous technique. Experiments are reported regarding the adopted visual features based on covariance matrices.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131884264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}