Johannes Schels, Joerg Liebelt, K. Schertler, R. Lienhart
{"title":"Synthetically trained multi-view object class and viewpoint detection for advanced image retrieval","authors":"Johannes Schels, Joerg Liebelt, K. Schertler, R. Lienhart","doi":"10.1145/1991996.1991999","DOIUrl":"https://doi.org/10.1145/1991996.1991999","url":null,"abstract":"This paper proposes a novel approach to multi-view object class and viewpoint detection for the retrieval of images showing one or several objects from a given viewpoint, a viewpoint range or any viewpoint in image databases. All detectors are trained exclusively on a few synthetic 3D models without any manual bounding-box, viewpoint or part annotation, making object class and viewpoint detection a scalable learning task. Previous work on this topic relies on the detection of object parts for each individual viewpoint, ignoring the responses of part detectors specific to other viewpoints. Instead, we explicitly exploit appearance ambiguities caused by spurious detections of parts under more than one viewpoint by combining all detector responses in a joint spatial pyramid encoding. We achieve state-of-the-art results in multi-view object class detection and viewpoint determination on current benchmarking data sets and demonstrate increased robustness to partial occlusion.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123260225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instant video summarization during shooting with mobile phone","authors":"Xiao Zeng, Xiaohui Xie, Kongqiao Wang","doi":"10.1145/1991996.1992036","DOIUrl":"https://doi.org/10.1145/1991996.1992036","url":null,"abstract":"To facilitate review and management of home videos captured by mobile phones, we propose a novel instant summarization method which is applied while users are shooting. Segment boundaries and key frames are extracted without any delay, which means that the extracted frames strictly synchronize with the scene being captured. Partial-context is the major challenge of this method since only captured frames are available when summarization is applied. And limited calculation resource of mobile phones is another restricted condition in such a video analysis, especially when video compression is executed meanwhile. Several frame features are utilized for segmentation and key frame extraction; and an original key frame updating strategy is presented to optimize selected representative frames in such partial-context. Experimental results demonstrate that the proposed method is satisfactory in the aspects of low computation complexity, high effectiveness and good user experience.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114535943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NV-Tree: nearest neighbors at the billion scale","authors":"Herwig Lejsek, B. Jónsson, L. Amsaleg","doi":"10.1145/1991996.1992050","DOIUrl":"https://doi.org/10.1145/1991996.1992050","url":null,"abstract":"This paper presents the NV-Tree (Nearest Vector Tree). It addresses the specific, yet important, problem of efficiently and effectively finding the approximate k-nearest neighbors within a collection of a few billion high-dimensional data points. The NV-Tree is a very compact index, as only six bytes are kept in the index for each high-dimensional descriptor. It thus scales extremely well when indexing large collections of high-dimensional descriptors. The NV-Tree efficiently produces results of good quality, even at such a large scale that the indices cannot be kept entirely in main memory any more. We demonstrate this with extensive experiments using a collection of 2.5 billion SIFT (Scale Invariant Feature Transform) descriptors.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123859239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting contextual spaces for image re-ranking and rank aggregation","authors":"D. C. G. Pedronette, R. Torres","doi":"10.1145/1991996.1992009","DOIUrl":"https://doi.org/10.1145/1991996.1992009","url":null,"abstract":"The objective of Content-based Image Retrieval (CBIR) systems is to return the most similar images given an image query. In this scenario, accurately ranking collection images is of great relevance. In general, CBIR systems consider only pairwise image analysis, that is, compute similarity measures considering only pair of images, ignoring the rich information encoded in the relations among several images. This paper presents a novel re-ranking approach based on contextual spaces aiming to improve the effectiveness of CBIR tasks, by exploring relations among images. In our approach, information encoded in both distances among images and ranked lists computed by CBIR systems are used for analyzing contextual information. The re-ranking method can also be applied to other tasks, such as: (i) for combining ranked lists obtained by using different image descriptors (rank aggregation); and (ii) for combining post-processing methods. We conducted several experiments involving shape, color, and texture descriptors and comparisons to other post-processing methods. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RetrievalLab: a programming tool for content based retrieval","authors":"Ard A. J. Oerlemans, M. Lew","doi":"10.1145/1991996.1992067","DOIUrl":"https://doi.org/10.1145/1991996.1992067","url":null,"abstract":"In this paper we present RetrievalLab, a content based retrieval tool that was designed for both educational and research purposes. It is a tool to facilitate the testing of new features, segmentations, machine learning approaches, and evaluation methods, by presenting a Matlab-like programming interface which illuminates the fundamental processes and algorithms in content based retrieval.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Li, Charles Otto, N. Haas, Yuichi Fujiki, Sharath Pankanti
{"title":"Component-based track inspection using machine-vision technology","authors":"Y. Li, Charles Otto, N. Haas, Yuichi Fujiki, Sharath Pankanti","doi":"10.1145/1991996.1992056","DOIUrl":"https://doi.org/10.1145/1991996.1992056","url":null,"abstract":"In this paper, we present our latest research engagement with a railroad company to apply machine vision technologies to automate the inspection and condition monitoring of railroad tracks. Specifically, we have proposed a complete architecture including imaging setup for capturing multiple video streams, important rail component detection such as tie plate, spike, anchor and joint bar bolt, defect identification such as raised spikes, defect severity analysis and temporal condition analysis, and long-term predictive assessment. This paper will particularly present various video analytics that we have developed to detect rail components, which form the building block of the entire framework. Our preliminary performance study has achieved an average of 98.2% detection rate, 1.57% false positive rate and 1.78% false negative rate on the component detection. Finally, with the lack of sufficient representative data and annotations to evaluate system performance on exception detection at both sequence and compliance levels, we proposed a mathematical modeling approach to calculate the probabilities of detecting such exceptions. Such analysis shows that there is still big room for us to improve our approaches in order to achieve desired false positive rate and miss detection rate at the sequence level.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122363190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Indexing the signature quadratic form distance for efficient content-based multimedia retrieval","authors":"C. Beecks, Jakub Lokoč, T. Seidl, T. Skopal","doi":"10.1145/1991996.1992020","DOIUrl":"https://doi.org/10.1145/1991996.1992020","url":null,"abstract":"The Signature Quadratic Form Distance has been introduced as an adaptive similarity measure coping with flexible content representations of various multimedia data. Although the Signature Quadratic Form Distance has shown good retrieval performance with respect to their qualities of effectiveness and efficiency, its applicability to index structures remains a challenging issue due to its dynamic nature. In this paper, we investigate the indexability of the Signature Quadratic Form Distance regarding metric access methods. We show how the distance's inherent parameters determine the indexability and analyze the relationship between effectiveness and efficiency on numerous image databases.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127743558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fusing heterogeneous modalities for video and image re-ranking","authors":"Hung-Khoon Tan, C. Ngo","doi":"10.1145/1991996.1992011","DOIUrl":"https://doi.org/10.1145/1991996.1992011","url":null,"abstract":"Multimedia documents in popular image and video sharing websites such as Flickr and Youtube are heterogeneous documents with diverse ways of representations and rich user-supplied information. In this paper, we investigate how the agreement among heterogeneous modalities can be exploited to guide data fusion. The problem of fusion is cast as the simultaneous mining of agreement from different modalities and adaptation of fusion weights to construct a fused graph from these modalities. An iterative framework based on agreement-fusion optimization is thus proposed. We plug in two well-known algorithms: random walk and semi-supervised learning to this framework to illustrate the idea of how agreement (conflict) is incorporated (compromised) in the case of uniform and adaptive fusion. Experimental results on web video and image re-ranking demonstrate that, by proper fusion strategy rather than simple linear fusion, performance improvement on search can generally be expected.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132250338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial codebooks for image categorization","authors":"Eugene Mbanya, S. Gerke, P. Ndjiki-Nya","doi":"10.1145/1991996.1992046","DOIUrl":"https://doi.org/10.1145/1991996.1992046","url":null,"abstract":"Currently, bag-of-words approaches for image categorization are very popular due to their relative simplicity, robustness and high efficiency. However, they lack the ability to represent the spatial composition of an image. This drawback has been addressed by several approaches, with spatial pyramids being the most popular. Spatial pyramids divide an image into smaller blocks, resulting in a feature vector for each block of the image. The feature vectors for these blocks are concatenated to form the feature vector of the whole image. This leads to an increase in dimension of the whole image's feature vector by a factor corresponding to the number of blocks the image is divided into. Consequently, this causes an increase in computation time proportional to the number of blocks. We propose an extension of the image feature vector by spatial features, which results in a descriptor of similar size as in the standard bag-of-words approach. The classification performance however is similar to those of spatial pyramids which use a feature vector of significantly larger size and therefore are more computationally expensive.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130011706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lost in binarization: query-adaptive ranking for similar image search with compact codes","authors":"Yu-Gang Jiang, Jun Wang, Shih-Fu Chang","doi":"10.1145/1991996.1992012","DOIUrl":"https://doi.org/10.1145/1991996.1992012","url":null,"abstract":"With the proliferation of images on the Web, fast search of visually similar images has attracted significant attention. State-of-the-art techniques often embed high-dimensional visual features into low-dimensional Hamming space, where search can be performed in real-time based on Hamming distance of compact binary codes. Unlike traditional metrics (e.g., Euclidean) of raw image features that produce continuous distance, the Hamming distances are discrete integer values. In practice, there are often a large number of images sharing equal Hamming distances to a query, resulting in a critical issue for image search where ranking is very important. In this paper, we propose a novel approach that facilitates query-adaptive ranking for the images with equal Hamming distance. We achieve this goal by firstly offline learning bit weights of the binary codes for a diverse set of predefined semantic concept classes. The weight learning process is formulated as a quadratic programming problem that minimizes intra-class distance while preserving interclass relationship in the original raw image feature space. Query-adaptive weights are then rapidly computed by evaluating the proximity between a query and the concept categories. With the adaptive bit weights, the returned images can be ordered by weighted Hamming distance at a finer-grained binary code level rather than at the original integer Hamming distance level. Experimental results on a Flickr image dataset show clear improvements from our query-adaptive ranking approach.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123535476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}