{"title":"ShotTagger: tag location for internet videos","authors":"Guangda Li, Meng Wang, Yantao Zheng, Haojie Li, Zhengjun Zha, Tat-Seng Chua","doi":"10.1145/1991996.1992033","DOIUrl":"https://doi.org/10.1145/1991996.1992033","url":null,"abstract":"Social video sharing websites allow users to annotate videos with descriptive keywords called tags, which greatly facilitate video search and browsing. However, many tags only describe part of the video content, without any temporal indication on when the tag actually appears. Currently, there is very little research on automatically assigning tags to shot-level segments of a video. In this paper, we leverage user's tags as a source to analyze the content within the video and develop a novel system named ShotTagger to assign tags at the shot level. There are two steps to accomplish the location of tags at shot level. The first is to estimate the distribution of tags within the video, which is based on a multiple instance learning framework. The second is to perform the semantic correlation of a tag with other tags in a video in an optimization framework and impose the temporal smoothness across adjacent video shots to refine the tagging results at shot level. We present different applications to demonstrate the usefulness of the tag location scheme in searching, and browsing of videos. A series of experiments conducted on a set of Youtube videos has demonstrated the feasibility and effectiveness of our approach.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"5 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116796382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Botanical data retrieval system supporting discovery learning","authors":"T. Kajiyama","doi":"10.1145/1991996.1992032","DOIUrl":"https://doi.org/10.1145/1991996.1992032","url":null,"abstract":"We constructed a botanical data retrieval system applying our proposed search interface named 'Concentric Ring View' for multi-faceted metadata. This system allows users to search flexibly and intuitively by combining attributes with simple operation. The attributes used as search keys are visual and botanical features such as flower color, leaf shape, blooming season, and so on. Users can create their own dynamic knowledge hierarchies by selecting these attributes and adjusting attribute values. We considered that this system enables users not only to search for plant names but also to learn morphological features and the taxonomy of plants, and performed a usability testing. We confirmed that users could finally find the correct plant names even if each user selected different attributes for searching, and users could also find correct plants by grasping visual features even if the images shown focused on flower and leaf. In discovery learning, users could learn plant features and botanical properties by finding out the common properties from the relationships between attributes and attribute values, and attribute values and retrieved results.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134232491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xirong Li, Cees G. M. Snoek, M. Worring, A. Smeulders
{"title":"Social negative bootstrapping for visual categorization","authors":"Xirong Li, Cees G. M. Snoek, M. Worring, A. Smeulders","doi":"10.1145/1991996.1992008","DOIUrl":"https://doi.org/10.1145/1991996.1992008","url":null,"abstract":"To learn classifiers for many visual categories, obtaining labeled training examples in an efficient way is crucial. Since a classifier tends to misclassify negative examples which are visually similar to positive examples, inclusion of such informative negatives should be stressed in the learning process. However, they are unlikely to be hit by random sampling, the de facto standard in literature. In this paper, we go beyond random sampling by introducing a novel social negative bootstrapping approach. Given a visual category and a few positive examples, the proposed approach adaptively and iteratively harvests informative negatives from a large amount of social-tagged images. To label negative examples without human interaction, we design an effective virtual labeling procedure based on simple tag reasoning. Virtual labeling, in combination with adaptive sampling, enables us to select the most misclassified negatives as the informative samples. Learning from the positive set and the informative negative sets results in visual classifiers with higher accuracy. Experiments on two present-day image benchmarks employing 650K virtually labeled negative examples show the viability of the proposed approach. On a popular visual categorization benchmark our precision at 20 increases by 34%, compared to baselines trained on randomly sampled negatives. We achieve more accurate visual categorization without the need of manually labeling any negatives.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132965093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Yang, Qiong Liu, Chunyuan Liao, K. Cheng, Andreas Girgensohn
{"title":"Large-scale EMM identification based on geometry-constrained visual word correspondence voting","authors":"Xin Yang, Qiong Liu, Chunyuan Liao, K. Cheng, Andreas Girgensohn","doi":"10.1145/1991996.1992031","DOIUrl":"https://doi.org/10.1145/1991996.1992031","url":null,"abstract":"We present a large-scale Embedded Media Marker (EMM) identification system which allows users to retrieve relevant dynamic media associated with a static paper document via camera-phones. The user supplies a query image by capturing an EMM-signified patch of a paper document through a camera phone. The system recognizes the query and in turn retrieves and plays the corresponding media on the phone. Accurate image matching is crucial for positive user experience in this application. To address the challenges posed by large datasets and variation in camera-phone-captured query images, we introduce a novel image matching scheme based on geometrically consistent correspondences. A hierarchical scheme, combined with two constraining methods, is designed to detect geometric constrained correspondences between images. A spatial neighborhood search approach is further proposed to address challenging cases of query images with a large translational shift. Experimental results on a 200k+ dataset show that our solution achieves high accuracy with low memory and time complexity and outperforms the baseline bag-of-words approach.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126756468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Rabbath, Philipp Sandhaus, Susanne CJ Boll
{"title":"Multimedia retrieval in social networks for photo book creation","authors":"Mohammad Rabbath, Philipp Sandhaus, Susanne CJ Boll","doi":"10.1145/1991996.1992068","DOIUrl":"https://doi.org/10.1145/1991996.1992068","url":null,"abstract":"Social networks such as Facebook witness an explosively growing number of shared photos. People enjoy taking photos of each other in public and private events, upload them, tag each other and continue commenting on each other's photos. These photos are considered to be great assets for many users who would like to create photo books from them, so that they can show their experience and special events to friends and family. However, finding the required photos is not an easy task, because they can be distributed over different friends and albums with different tags and annotations, and they can be of different importance to the user. In this paper we propose a software prototype which employs multimedia retrieval methods to allow an easy creation of photo books from social networks. In our system photos are first clustered to events which can potentially be distributed over the social network, and then selected based on their importance and finally distributed and laid out over the pages of a photo book. The functionalities of our system are implemented and deployed as web services and our presentation layer is a wizard-like Facebook application.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of extended fingerprint hashing and locality sensitive hashing for binary audio fingerprints","authors":"K. Moravec, I. Cox","doi":"10.1145/1991996.1992027","DOIUrl":"https://doi.org/10.1145/1991996.1992027","url":null,"abstract":"Hash tables have been proposed for the indexing of high-dimensional binary vectors, specifically for the identification of media by fingerprints. In this paper we develop a new model to predict the performance of a hash-based method (Fingerprint Hashing) under varying levels of noise. We show that by the adjustment of two parameters, robustness to a higher level of noise is achieved. We extend Fingerprint Hashing to a multi-table range search (Extended Fingerprint Hashing) and show this approach also increases robustness to noise. We then show the relationship between Extended Fingerprint Hashing and Locality Sensitive Hashing and investigate design choices for dealing with higher noise levels. If index size must be held constant, the Extended Fingerprint Hash is a superior method. We also show that to achieve similar performance at a given level of noise a Locality Sensitive Hash requires nearly a six-fold increase in index size which is likely to be impractical for many applications.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongtao Xie, Ke Gao, Yongdong Zhang, Jintao Li, Yizhi Liu
{"title":"Pairwise weak geometric consistency for large scale image search","authors":"Hongtao Xie, Ke Gao, Yongdong Zhang, Jintao Li, Yizhi Liu","doi":"10.1145/1991996.1992038","DOIUrl":"https://doi.org/10.1145/1991996.1992038","url":null,"abstract":"State-of-the-art image search systems mostly build on bag-of-features (BOF) representation. As BOF ignores geometric relationships among local features, geometric consistency constraints have been proposed to improve search precision. However, exploiting full geometric constraints are too computational expensive. Weak geometric constraints have strong assumptions and can only deal with uniform transformations. To handle view point changes and nonrigid deformations, in this paper we present a novel pairwise weak geometric consistency constraint (P-WGC) method. It utilizes the local similarity characteristic of deformations, and measures the pairwise geometric similarity of matches between two sets of local features. Experiments performed on four famous datasets and a dataset of one million of images show a significant improvement due to P-WGC as well as its efficiency. Further improvement of search accuracy is obtained when it is combined with full geometric verification.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128318703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable visual models for human perception-based object retrieval","authors":"A. Rebai, A. Joly, N. Boujemaa","doi":"10.1145/1991996.1992017","DOIUrl":"https://doi.org/10.1145/1991996.1992017","url":null,"abstract":"Understanding the results returned by automatic visual concept detectors is often a tricky task making users uncomfortable with these technologies. In this paper we attempt to build humanly interpretable visual models, allowing the user to visually understand the underlying semantic. We therefore propose a supervised multiple instance learning algorithm that selects as few as possible discriminant local features for a given object category. The method finds its roots in the lasso theory where a L1-regularization term is introduced in order to constraint the loss function, and subsequently produce sparser solutions. Efficient resolution of the lasso path is achieved through a boosting-like procedure inspired by BLasso algorithm. Quantitatively, our method achieves similar performance as current state-of-the-art, and qualitatively, it allows users to construct their own model from the original set of patches learned, thus allowing for more compound semantic queries.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121328747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Water reflection recognition via minimizing reflection cost based on motion blur invariant moments","authors":"S. Zhong, Yan Liu, Ling Shao, K. F. Chung","doi":"10.1145/1991996.1992001","DOIUrl":"https://doi.org/10.1145/1991996.1992001","url":null,"abstract":"Water reflection, a kind of typical imperfect reflection symmetry problem, plays an important role in image content analysis. However, existing techniques of symmetry recognition cannot recognize water reflection images correctly because of the complex and various distortions caused by water wave. To address this difficulty, we construct a novel feature space which is composed of motion blur invariant moments. Moreover, we propose an efficient detection algorithm to determine the reflection axis in images with water reflection. By experimenting on real image dataset with different tasks, the proposed techniques demonstrate impressive results in the water reflection image classification, the reflection axis detection, and the retrieval of the images with water reflection.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129943783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Messina, M. Montagnuolo, Riccardo Di Massa, Andrea Elia
{"title":"The hyper media news system for multimodal and personalised fruition of informative content","authors":"A. Messina, M. Montagnuolo, Riccardo Di Massa, Andrea Elia","doi":"10.1145/1991996.1992060","DOIUrl":"https://doi.org/10.1145/1991996.1992060","url":null,"abstract":"Despite the number of efforts, managing content for media production and distribution can be still considered a time-consuming and hard task. From here derives the need for effective data documentation and retrieval systems. Hyper Media News provides a set of integrated tools for large-scale acquisition, analysis, indexing, browsing and recommendation of news content from both television and the Web.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134437733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}