Jie Xiao, Wen-gang Zhou, Xia Li, Meng Wang, Q. Tian
{"title":"Image tag re-ranking by coupled probability transition","authors":"Jie Xiao, Wen-gang Zhou, Xia Li, Meng Wang, Q. Tian","doi":"10.1145/2393347.2396328","DOIUrl":"https://doi.org/10.1145/2393347.2396328","url":null,"abstract":"The large amount of user-tagged images on social networks is helpful to facilitate image management and image search. However, many tags are weakly relevant or irrelevant to the visual content, resulting in unsatisfactory performance in tag related applications. In this paper, we propose a coupled probability transition algorithm to estimate the text-visual group relevance from the observed data and then leverage it to predict tag relevance for a new query image. The visual group for a given tag is a cluster of images that are visually similar and share the same tag. The tag-visual group relevance is uncovered by exploiting the mutual reinforcement in visual space and semantic space alternatively. Experiments on NUS-WIDE dataset show the validity and superiority of the proposed approach over existing methods.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121356699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Depth estimation for semi-automatic 2D to 3D conversion","authors":"Richard Rzeszutek, Raymond Phan, D. Androutsos","doi":"10.1145/2393347.2396320","DOIUrl":"https://doi.org/10.1145/2393347.2396320","url":null,"abstract":"The conversion of monoscopic footage into stereoscopic or multiview content is a difficult and time consuming task. A number of semi-automatic methods have been developed to speed up the process and provide some control to the user. However these methods require that the user provide detailed labels indicating the relative depth of objects in the scene. In this paper we present a method to automatically estimate depth in such a way that it is amenable to semi-automatic conversion. The method is designed to simplify the depth labelling task so that the user does not have to provide as many depth labels.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127438789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Full paper session 14: mobile systems","authors":"Tao Mei","doi":"10.1145/3246407","DOIUrl":"https://doi.org/10.1145/3246407","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127492819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Wu, Shiyang Lu, Tao Mei, Jian Zhang, Shipeng Li
{"title":"Local visual words coding for low bit rate mobile visual search","authors":"Yue Wu, Shiyang Lu, Tao Mei, Jian Zhang, Shipeng Li","doi":"10.1145/2393347.2396364","DOIUrl":"https://doi.org/10.1145/2393347.2396364","url":null,"abstract":"Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30-40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit rate mobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123432524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Technical demo session 2","authors":"Q. Tian","doi":"10.1145/3246415","DOIUrl":"https://doi.org/10.1145/3246415","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo César, David A. Shamma, Doug Williams, Cees G. M. Snoek
{"title":"International workshop on socially-aware multimedia (SAM'12)","authors":"Pablo César, David A. Shamma, Doug Williams, Cees G. M. Snoek","doi":"10.1145/2393347.2396538","DOIUrl":"https://doi.org/10.1145/2393347.2396538","url":null,"abstract":"Multimedia social communication is filtering into everyday use. Videoconferencing is appearing in the living room and beyond, television is becoming smart and social, and media sharing applications are transforming the way we converse and recall events. The confluence of computer-mediated interaction, social networking, and multimedia content are radically reshaping social communications, bringing new challenges and opportunities. This workshop provides an opportunity to explore socially-aware multimedia, in which the social dimension of mediated interactions between people are considered as important as the characteristics of the media content. Even though this social dimension is implicitly addressed in some current solutions, further research is needed to better understand what makes multimedia socially-aware. In other words, social interactivity needs to become a first class citizen of multimedia research.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125298807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-aware affective images classification based on bilayer sparse representation","authors":"Bing Li, Weihua Xiong, Weiming Hu, Xinmiao Ding","doi":"10.1145/2393347.2396296","DOIUrl":"https://doi.org/10.1145/2393347.2396296","url":null,"abstract":"In image understanding, the automatic recognition of emotion in an image is becoming important from an applicative viewpoint. Considering the fact that the emotion evoked by an image is not only from its global appearance but also interplays among local regions, we propose a novel context-aware classification model based on bilayer sparse representation (BSR) that simultaneously takes the local context and global-local context into account. The BSR model contains two layers: global sparse representation (GSR) and local sparse representation (LSR). The GSR is to define global similarities between a test image and all training images; while the LSR is to define similarities of local regions' appearances and their co-occurrence between a test image and all training images. The experiments on two data sets demonstrate that our method is effective on affective images classification.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia news digger on emerging topics from social streams","authors":"Bingkun Bao, Weiqing Min, J. Sang, Changsheng Xu","doi":"10.1145/2393347.2396483","DOIUrl":"https://doi.org/10.1145/2393347.2396483","url":null,"abstract":"With the overwhelming information from social media networks and news portals, it is crucial to provide users a complete package of visual and textual information with popular interests automatically. To this concern, we present a news detection and pushing system, called Me-Digger (Multimedia News Digger), which not only effectively detects emerging topics from social streams but also provides the corresponding information in multiple modalities. Me-digger is the first systematic effort to leverage three sources of data, that is, Twitter, Flickr and Google news, to output with vivid visual and textual contents on emerging topics. Enabled by a novel general-structured high-order co-clustering approach, it has a more accurate detection of emerging topics compared to the existing methods on micro-blog social streams.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122842643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similar image search with a tiny bag-of-delegates representation","authors":"Weiwen Tu, Rong Pan, Jingdong Wang","doi":"10.1145/2393347.2396338","DOIUrl":"https://doi.org/10.1145/2393347.2396338","url":null,"abstract":"Similar image search over a large image database has been attracting a lot of attention recently. The widely-used solution is to use a set of codes, which we call bag-of-delegates, to represent each image, and to build inverted indices to organize the image database. The search can be conducted through the inverted indices, which is the same to the way of using texts to index images for search and has been shown to be efficient and effective. In this paper, we propose a tiny bag-of-delegates representation that uses a small amount of delegates with a high search performance guaranteed. The main advantage is that less storageis required to save the inverted indices while having a high search accuracy. We propose an adaptive forward selection scheme to sequentially learn more and more inverted indices that are constructed based on subspace partition, e.g. using spatial partition trees. Experimental results demonstrate that our approach can require a smaller number of delegates while achieving the same accuracy and taking similar time.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129622101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One shot learning gesture recognition with Kinect sensor","authors":"Di Wu, Fan Zhu, Ling Shao, Hui Zhang","doi":"10.1145/2393347.2396454","DOIUrl":"https://doi.org/10.1145/2393347.2396454","url":null,"abstract":"Gestures are both natural and intuitive for Human-Computer-Interaction (HCI) and the one-shot learning scenario is one of the real world situations in terms of gesture recognition problems. In this demo, we present a hand gesture recognition system using the Kinect sensor, which addresses the problem of one-shot learning gesture recognition with a user-defined training and testing system. Such a system can behave like a remote control where the user can allocate a specific function using a prefered gesture by performing it only once. To adopt the gesture recognition framework, the system first automatically segments an action sequence into atomic tokens, and then adopts the Extended-Motion-History-Image (Extended-MHI) for motion feature representation. We evaluate the performance of our system quantitatively in Chalearn Gesture Challenge, and apply it to a virtual one shot learning gesture recognition system.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129005115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}