Zhenbang Sun, Changhu Wang, Liqing Zhang, Lei Zhang
{"title":"Sketch2Tag: automatic hand-drawn sketch recognition","authors":"Zhenbang Sun, Changhu Wang, Liqing Zhang, Lei Zhang","doi":"10.1145/2393347.2396429","DOIUrl":"https://doi.org/10.1145/2393347.2396429","url":null,"abstract":"In this work, we introduce the Sketch2Tag system for hand-drawn sketch recognition. Due to large variations presented in hand-drawn sketches, most of existing work was limited to a particular domain or limited predefined classes. Different from existing work, Sketch2Tag is a general sketch recognition system, towards recognizing any semantically meaningful object that a child can recognize. This system enables a user to draw a sketch on the query panel, and then provides real-time recognition results. To increase the recognition coverage, a web-scale clipart image collection is leveraged as the knowledge base of the recognition system. Better understanding a user's drawing will be of great value to a variety of applications, such as, improving the sketch-based image search by combining the recognition results as textual queries.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123968638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint semantic segmentation by searching for compatible-competitive references","authors":"Ping Luo, Xiaogang Wang, Liang Lin, Xiaoou Tang","doi":"10.1145/2393347.2396310","DOIUrl":"https://doi.org/10.1145/2393347.2396310","url":null,"abstract":"This paper presents a framework for semantically segmenting a target image without tags by searching for references in an image database, where all the images are unsegmented but annotated with tags. We jointly segment the target image and its references by optimizing both semantic consistencies within individual images and correspondences between the target image and each of its references. In our framework, we first retrieve two types of references with a semantic-driven scheme: i) the compatible references which share similar global appearance with the target image; and ii) the competitive references which have distinct appearance to the target image but similar tags with one of the compatible references. The two types of references have complementary information for assisting the segmentation of the target image. Then we construct a novel graphical representation, in which the vertices are superpixels extracted from the target image and its references. The segmentation problem is posed as labeling all the vertices with the semantic tags obtained from the references. The method is able to label images without the pixel-level annotation and classifier training, and it outperforms the state-of-the-arts approaches on the MSRC-21 database.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123973698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring and browsing photos through characteristic geographic tag regions","authors":"B. Thomee, Adam Rae","doi":"10.1145/2393347.2396438","DOIUrl":"https://doi.org/10.1145/2393347.2396438","url":null,"abstract":"We present a system that supports zoomable browsing and exploration of photos taken across the globe. Our system is based on a novel algorithm that automatically uncovers the colloquial boundaries of regions that are characteristic for individual tags used in a large collection of geo-referenced photos. We first model the data using scale-space theory, which allows us to represent it simultaneously across different scales as a family of increasingly smoothed density distributions, after which we derive the region boundaries by applying image analysis techniques to the scale-space representation of each tag. The interface visualizes the shape and size of the resulting boundaries for each tag along the dimensions of space and time across multiple scales, giving the user the ability to explore the world as patchwork of dynamic characterizing geographic tag regions and to browse through their associated photos.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"341 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124217316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive data-driven discovery of temporal behavior models from events in media streams","authors":"Chreston A. Miller, Francis K. H. Quek","doi":"10.1145/2393347.2393413","DOIUrl":"https://doi.org/10.1145/2393347.2393413","url":null,"abstract":"This paper investigates a technique for the discovery of temporal behavior models within multimedia event data. Advancements in both technology and the marketplace present us the opportunity for research in analysis of situated human behavior using video and other sensor data (media streams). By situated analysis, we mean the study of behavior in time as opposed to looking at behavior in the form of aggregated data divorced from how they occur in context. Human and social scientists seek to model behavior captured in media, and these data may be represented in a multi-dimensional event data space derived from media streams. The knowledge of these scientists (experts) is a valuable resource which can be leveraged to search this space. We propose a solution that incorporates the expert in an iteratively, interactive data-driven discovery process to evolve a desired behavior model. We test our solution's accuracy on a multimodal meeting corpus with a progressive three tiered approach.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125871850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Poster Session 3","authors":"S. Panchanathan","doi":"10.1145/3246412","DOIUrl":"https://doi.org/10.1145/3246412","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy concerns in multimedia and their solutions","authors":"G. Friedland","doi":"10.1145/2393347.2396551","DOIUrl":"https://doi.org/10.1145/2393347.2396551","url":null,"abstract":"This article summarizes the corresponding 3-hour tutorial at ACM Multimedia 2012.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130899302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the music content authentication","authors":"Wei Li, Bilei Zhu, Zhurong Wang","doi":"10.1145/2393347.2396393","DOIUrl":"https://doi.org/10.1145/2393347.2396393","url":null,"abstract":"Digital audio has been ubiquitous over the past decade. Since it can be easily modified by editing tools, there has been a strong need to protect its content for secure multimedia applications. Existing audio authentication algorithms are mainly focused on either human speech or general audio with music as part of the test data, while special research on music authentication has been somewhat neglected. In this article, we propose a novel algorithm to protect the integrity and authenticity of music signals. Its main contributions include: (1) Music is segmented into beat-based frames, which not only endows the authentication units with more semantic meaning but also perfectly resolves the challenging synchronization problem; (2) Robust hashes are generated from Chroma-based mid-level audio feature which can appropriately characterize the music content, and integrated with an encryption procedure to ensure the security against malicious block-wise vector quantization attack; (3) Fuzzy logic is adopted to make the authentication decision in light of three measures defined on bit errors, coinciding with the inherent blurred nature of authentication. Experiments exhibit good discriminative ability between admissible and malicious operations.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content is dead: long-live content!","authors":"Lexing Xie, David A. Shamma, Cees G. M. Snoek","doi":"10.1145/2393347.2393355","DOIUrl":"https://doi.org/10.1145/2393347.2393355","url":null,"abstract":"Panel Overview Multimedia content analysis has always held a major research role in the ACM Multimedia research community. Ten years ago, at ACM MM 2002, a panel debated on “Media Semantics: Who Needs It and Why?” [2] Today, the answer is obvious. Multimedia content analysis has burgeoned on a foundation of machine learning and data-intensive algorithms, and has influenced many recent applications, from finding faces to augmenting reality.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121921084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajun Liu, Zi Huang, Lei Chen, Heng Tao Shen, Zhixian Yan
{"title":"Discovering areas of interest with geo-tagged images and check-ins","authors":"Jiajun Liu, Zi Huang, Lei Chen, Heng Tao Shen, Zhixian Yan","doi":"10.1145/2393347.2393429","DOIUrl":"https://doi.org/10.1145/2393347.2393429","url":null,"abstract":"Geo-tagged image is an ideal source for the discovery of popular travel places. However, the aspects of popular venues for daily-life purposes like dining and shopping are often missing in the mined locations from geo-tagged images. Fortunately check-in websites provide us a unique opportunity of analyzing people's preferences in their daily lives to complement the knowledge mined from geo-tagged images. This paper presents a novel approach for the discovery of Areas of Interest (AoI). By analyzing both geo-tagged images and check-ins, the approach exploits travelers' flavors as well as the preferences of daily-life activities of local residents to find AoI in a city. The proposed approach consists of two major steps. Firstly, we devise a density-based clustering method to discover AoI, mainly based on the image densities but also reinforced by the secondary densities from the images' neighboring venues. Then we propose a novel joint authority analysis framework to rank AoI. The framework simultaneously considers both the location-location transitions, and the user-location relations. An interactive presentation interface for visualizing AoI is also presented. The approach is tested with very large datasets for Shanghai city. They consist of 49,460 geo-tagged images from Panoramio.com, and 1,361,547 check-ins from the check-in website Qieke.com. By evaluating the ranking accuracy and quality of AoI, we demonstrate great improvements of our method over compared methods.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122011761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia recommendation","authors":"Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui","doi":"10.1145/2393347.2396554","DOIUrl":"https://doi.org/10.1145/2393347.2396554","url":null,"abstract":"Due to the rapid growth of online multimedia information, the problem of information overload has become more and more serious in recent decades. To address this problem, various multimedia recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval, and machine learning). Meanwhile, many commercial web systems (e.g., Flick, Youtube, and Last.fm) have successfully applied recommendation techniques to provide users personalized multimedia content and services in a convenient and flexible way. This tutorial focuses on exploring the state-of-the-art in multimedia recommendation. We also discuss the experience gained from developing existing systems and review key challenges associated with large-scale multimedia recommendation.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116191557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}