Albrecht J. Lindner, Appu Shaji, Nicolas Bonnier, S. Süsstrunk
{"title":"Joint statistical analysis of images and keywords with applications in semantic image enhancement","authors":"Albrecht J. Lindner, Appu Shaji, Nicolas Bonnier, S. Süsstrunk","doi":"10.1145/2393347.2393417","DOIUrl":"https://doi.org/10.1145/2393347.2393417","url":null,"abstract":"With the advent of social image-sharing communities, millions of images with associated semantic tags are now available online for free and allow us to exploit this abundant data in new ways. We present a fast non-parametric statistical framework designed to analyze a large data corpus of images and semantic tag pairs and find correspondences between image characteristics and semantic concepts. We learn the relevance of different image characteristics for thousands of keywords from one million annotated images. We demonstrate the framework's effectiveness with three different examples of semantic image enhancement: we adapt the gray-level tone-mapping, emphasize semantically relevant colors, and perform a defocus magnification for an image based on its semantic context. The performance of our algorithms is validated with psychophysical experiments.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"38 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Full paper session 9: presentation and organization","authors":"Heng Tao Shen","doi":"10.1145/3246402","DOIUrl":"https://doi.org/10.1145/3246402","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126687135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image annotation by semantic sparse recoding of visual content","authors":"Zhiwu Lu, Yuxin Peng","doi":"10.1145/2393347.2393418","DOIUrl":"https://doi.org/10.1145/2393347.2393418","url":null,"abstract":"This paper presents a new semantic sparse recoding method to generate more descriptive and robust representation of visual content for image annotation. Although the visual bag-of-words (BOW) representation has been reported to achieve promising results in image annotation, its visual codebook is completely learnt from low-level visual features using quantization techniques and thus the so-called semantic gap remains unbridgeable. To handle such challenging issue, we utilize both the annotations of training images and the predicted annotations of test images to improve the original visual BOW representation. This is further formulated as a sparse coding problem so that the noise issue induced by the inaccurate quantization of visual features can also be handled to some extent. By developing an efficient sparse coding algorithm, we successfully generate a new visual BOW representation for image annotation. Since such sparse coding has actually incorporated the high-level semantic information into the original visual codebook, we thus consider it as semantic sparse recoding of the visual content. Although the predicted annotations of test images are also used as inputs by the traditional image annotation refinement, we focus on the visual BOW representation refinement for image annotation in this paper. The experimental results on two benchmark datasets show the superior performance of our semantic sparse recoding method in image annotation.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127032903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SymCity: feature selection by symmetry for large scale image retrieval","authors":"Giorgos Tolias, Yannis Kalantidis, Yannis Avrithis","doi":"10.1145/2393347.2393379","DOIUrl":"https://doi.org/10.1145/2393347.2393379","url":null,"abstract":"Many problems, including feature selection, vocabulary learning, location and landmark recognition, structure from motion and 3d reconstruction, rely on a learning process that involves wide-baseline matching on multiple views of the same object or scene. In practical large scale image retrieval applications however, most images depict unique views where this idea does not apply. We exploit self-similarities, symmetries and repeating patterns to select features within a single image. We achieve the same performance compared to the full feature set with only a small fraction of its index size on a dataset of unique views of buildings or urban scenes, in the presence of one million distractors of similar nature. Our best solution is linear in the number of correspondences, with practical running times of just a few milliseconds.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal semi-supervised metric learning for image retrieval","authors":"Kun Zhao, W. Liu, Jianzhuang Liu","doi":"10.1145/2393347.2396340","DOIUrl":"https://doi.org/10.1145/2393347.2396340","url":null,"abstract":"In a typical content-based image retrieval (CBIR) system, images are represented as vectors and similarities between images are measured by a specified distance metric. However, the traditional Euclidean distance cannot always deliver satisfactory performance, so an effective metric sensible to the input data is desired. Tremendous recent works on metric learning have exhibited promising performance, but most of them suffer from limited label information and expensive training costs. In this paper, we propose two novel metric learning approaches, Optimal Semi-Supervised Metric Learning and its kernelized version. In the proposed approaches, we incorporate information from both labeled and unlabeled data to design a convex and computationally tractable learning framework which results in a globally optimal solution to the target metric of much lower rank than the original data dimension. Experiments on several image benchmarks demonstrate that our approaches lead to consistently better distance metrics than the state-of-the-arts in terms of accuracy for image retrieval.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114534421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trajectory signature for action recognition in video","authors":"Nicolas Ballas, Bertrand Delezoide, F. Prêteux","doi":"10.1145/2393347.2396511","DOIUrl":"https://doi.org/10.1145/2393347.2396511","url":null,"abstract":"Bag-of-Words representation based on trajectory local features and taking into account the spatio-temporal context through static segmentation grids is currently the leading paradigm to perform action annotation.While providing a coarse localization of low-level features, those approaches tend to be limited by the grid rigidity. In this work we propose two contributions on trajectory based signatures. First, we extend a local trajectory feature to characterize the acceleration in videos, leading to invariance to camera constant motion. We also introduce two new adaptive segmentation grids, namely Adaptive Grid (AG) and Deformable Adaptive Grid (DAG). AG is learnt from videos data, to fit a given dataset and overcome static grid rigidity. DAG is also learnt from video data. Moreover, it can be adapted to a specific video through a deformation operation. Our adaptive grids are then exploited by a Bag-of-Words model at the aggregation step for action recognition. Our proposal is evaluated on 4 publicly available datasets.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raj Kumar Gupta, A. Chia, D. Rajan, Ee Sin Ng, Zhiyong Huang
{"title":"Image colorization using similar images","authors":"Raj Kumar Gupta, A. Chia, D. Rajan, Ee Sin Ng, Zhiyong Huang","doi":"10.1145/2393347.2393402","DOIUrl":"https://doi.org/10.1145/2393347.2393402","url":null,"abstract":"We present a new example-based method to colorize a gray image. As input, the user needs only to supply a reference color image which is semantically similar to the target image. We extract features from these images at the resolution of superpixels, and exploit these features to guide the colorization process. Our use of a superpixel representation speeds up the colorization process. More importantly, it also empowers the colorizations to exhibit a much higher extent of spatial consistency in the colorization as compared to that using independent pixels. We adopt a fast cascade feature matching scheme to automatically find correspondences between superpixels of the reference and target images. Each correspondence is assigned a confidence based on the feature matching costs computed at different steps in the cascade, and high confidence correspondences are used to assign an initial set of chromatic values to the target superpixels. To further enforce the spatial coherence of these initial color assignments, we develop an image space voting framework which draws evidence from neighboring superpixels to identify and to correct invalid color assignments. Experimental results and user study on a broad range of images demonstrate that our method with a fixed set of parameters yields better colorization results as compared to existing methods.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127888922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraint-optimized keypoint inhibition/insertion attack: security threat to scale-space image feature extraction","authors":"Chun-Shien Lu, Chao-Yung Hsu","doi":"10.1145/2393347.2393434","DOIUrl":"https://doi.org/10.1145/2393347.2393434","url":null,"abstract":"Scale-space image feature extraction (SSIFE) has been widely adopted in broad areas due to its powerful resilience to attacks. However, the security threat to SSIFE-based applications, which will be addressed in this paper, is relatively unexplored. The security threat to SSIFT (called ST-SSIFE), composed of a constrained-optimization keypoint inhibition attack (KIHA) and a keypoint insertion attack (KISA), is specifically designed in this paper for scale-space feature extraction methods, such as SIFT and SURF. In ST-SSIFE, KIHA aims at making a fool of feature extraction protocols in that the detection rules are purposely violated so as to suppress the existence of a local maximum around a local region. We show that KIHA can be accomplished quickly via Lagrange multiplier but the resultant new keypoint generation (NKG) problem can be solved via Karush Kuhn Tucker (KKT) conditions. In order to leverage among keypoint removal with minimum distortion, suppression of NKG, and complexity, we further present a hybrid scheme of integrating Lagrange multiplier and KKT conditions. On the other hand, KISA is designed via an efficient coarse-to-fine descriptor matching strategy to yield fake feature points so as to create false positives. Experiments, conducted on keypoint removal rate evaluation and an image copy detection method operating on a web-scale image database as a case study, demonstrate the feasibility of our method.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125622746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quicktoon: a real-time video stylization and sharing system on general processors","authors":"Hongsheng Yang, Huanliang Sun, Jiangbo Lu","doi":"10.1145/2393347.2396428","DOIUrl":"https://doi.org/10.1145/2393347.2396428","url":null,"abstract":"We present a video stylization and sharing system named QuickToon, which supports generating a variety of useful and pleasing visual effects and allows easy use in video chat or sharing via social networking services (SNS). Based on our highly efficient edge-preserving smoothing filter, non-photo-realistic video rendering effects can be generated on the fly such as skin beautification, cartoon-like rendition, object outline, pencil sketch, and color stroke. Without requiring any GPUs that would otherwise seriously limits portability, the QuickToon system runs comfortably in real time for VGA- or HD-sized videos on general processors such as CPUs. The system can take in video frames either from a live webcam feed or any photos/videos from the local store, while the transformed imagery can be used in a live Skype video call, or saved locally, or uploaded and shared to SNS with one click. We demonstrate with concrete examples the functionality of this system, and underline its utility in video communication and photo/video sharing.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"10 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empathetic heartbeat","authors":"H. Ando, J. Watanabe, Masahiko Sato","doi":"10.1145/2393347.2396530","DOIUrl":"https://doi.org/10.1145/2393347.2396530","url":null,"abstract":"Empathy is important for our society and individuals, since it is what facilitates our interactions and connections to the other people around us. Our experience-based installation \"empathetic heartbeat\" aims to have the participant remind the existence of his/her heart in the internal body and recognize that our bodies are medium for feeling empathy with others. Here we describe the concept of the installation and participant's experience.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115974266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}