{"title":"Error recovered hierarchical classification","authors":"Shiai Zhu, Xiao-Yong Wei, C. Ngo","doi":"10.1145/2502081.2502182","DOIUrl":"https://doi.org/10.1145/2502081.2502182","url":null,"abstract":"Hierarchical classification (HC) is a popular and efficient way for detecting the semantic concepts from the images. However, the conventional HC, which always selects the branch with the highest classification response to go on, has the risk of propagating serious errors from higher levels of the hierarchy to the lower levels. We argue that the highest-response-first strategy is too arbitrary, because the candidate nodes are considered individually which ignores the semantic relationship among them. In this paper, we propose a novel method for HC, which is able to utilize the semantic relationship among candidate nodes and their children to recover the responses of unreliable classifiers of the candidate nodes, with the hope of providing the branch selection a more globally valid and semantically consistent view. The experimental results show that the proposed method outperforms the conventional HC methods and achieves a satisfactory balance between the accuracy and efficiency.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73755262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Activity-aware adaptive compression: a morphing-based frame synthesis application in 3DTI","authors":"Chien-Nan Chen, Pengye Xia, K. Nahrstedt","doi":"10.1145/2502081.2508116","DOIUrl":"https://doi.org/10.1145/2502081.2508116","url":null,"abstract":"In view of the different demands on quality of service of different user activities in the 3D Tele-immersive (3DTI) environment, we combine activity recognition and real-time morphing-based compression and present the Activity-Aware Adaptive Compression. We implement this scheme on our 3DTI platform: the TEEVE Endpoint, which is a runtime engine to handle the creation, transmission and rendering of 3DTI data. User study as well as objective evaluation of the scheme show that it can achieve 25% more bandwidth saving compared to conventional 3D data compression as zlib without perceptible degradation in the user experience.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73496609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-media topic mining on wikipedia","authors":"Xikui Wang, Yang Liu, Donghui Wang, Fei Wu","doi":"10.1145/2502081.2502180","DOIUrl":"https://doi.org/10.1145/2502081.2502180","url":null,"abstract":"As a collaborative wiki-based encyclopedia, Wikipedia provides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media) from Wikipedia. In this work, we propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the l1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes our model more interpretable and robust. Furthermore, the correlations of Wikipedia data in different modalities are explicitly considered in our model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81763109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speaking swiss: languages and venues in foursquare","authors":"Darshan Santani, D. Gática-Pérez","doi":"10.1145/2502081.2502133","DOIUrl":"https://doi.org/10.1145/2502081.2502133","url":null,"abstract":"Due to increasing globalization, urban societies are becoming more multicultural. The availability of large-scale digital mobility traces e.g. from tweets or checkins provides an opportunity to explore multiculturalism that until recently could only be addressed using survey-based methods. In this paper we examine a basic facet of multiculturalism through the lens of language use across multiple cities in Switzerland. Using data obtained from Foursquare over 330 days, we present a descriptive analysis of linguistic differences and similarities across five urban agglomerations in a multicultural, western European country.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84573202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multigrid approach for bandwidth and display resolution aware streaming of 3D deformations","authors":"Yuan Tian, Y. Yang, X. Guo, B. Prabhakaran","doi":"10.1145/2502081.2502181","DOIUrl":"https://doi.org/10.1145/2502081.2502181","url":null,"abstract":"In this paper, we propose a novel multimedia system adaptively streaming the animation according to display resolution and/or network bandwidth. A Multigrid-like technique is used in this framework to accelerate the converging rate of the optimization of the nonlinear deformation energy. The computation is performed from coarsest mesh at the top level to the finest mesh at the bottom level and then goes back to the top again. Such V-shape calculation provides great flexibility for the networked environment. Clients are able to receive the data streaming corresponding to its display resolution and network bandwidth. A more compact form of deformation data packaging is also used in this system such that a cube element only needs six parameters instead of 24 variables as used in regular mesh representation, which significantly reduces the network overhead for the streaming.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85880814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Similarity search","authors":"Yong Rui","doi":"10.1145/3245288","DOIUrl":"https://doi.org/10.1145/3245288","url":null,"abstract":"","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90898835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spot the differences: from a photograph burst to the single best picture","authors":"H. E. Tasli, J. V. Gemert, T. Gevers","doi":"10.1145/2502081.2502190","DOIUrl":"https://doi.org/10.1145/2502081.2502190","url":null,"abstract":"With the rise of the digital camera, people nowadays typically take several near-identical photos of the same scene to maximize the chances of a good shot. This paper proposes a user-friendly tool for exploring a personal photo gallery for selecting or even creating the best shot of a scene between its multiple alternatives. This functionality is realized through a graphical user interface where the best viewpoint can be selected from a generated panorama of the scene. Once the viewpoint is selected, the user is able to go explore possible alternatives coming from the other images. Using this tool, one can explore a photo gallery efficiently. Moreover, additional compositions from other images are also possible. With such additional compositions, one can go from a burst of photographs to the single best one. Even funny compositions of images, where you can duplicate a person in the same image, are possible with our proposed tool.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91330326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junshi Huang, Hairong Liu, Jialie Shen, Shuicheng Yan
{"title":"Towards efficient sparse coding for scalable image annotation","authors":"Junshi Huang, Hairong Liu, Jialie Shen, Shuicheng Yan","doi":"10.1145/2502081.2502127","DOIUrl":"https://doi.org/10.1145/2502081.2502127","url":null,"abstract":"Nowadays, content-based retrieval methods are still the development trend of the traditional retrieval systems. Image labels, as one of the most popular approaches for the semantic representation of images, can fully capture the representative information of images. To achieve the high performance of retrieval systems, the precise annotation for images becomes inevitable. However, as the massive number of images in the Internet, one cannot annotate all the images without a scalable and flexible (i.e., training-free) annotation method. In this paper, we particularly investigate the problem of accelerating sparse coding based scalable image annotation, whose off-the-shelf solvers are generally inefficient on large-scale dataset. By leveraging the prior that most reconstruction coefficients should be zero, we develop a general and efficient framework to derive an accurate solution to the large-scale sparse coding problem through solving a series of much smaller-scale subproblems. In this framework, an active variable set, which expands and shrinks iteratively, is maintained, with each snapshot of the active variable set corresponding to a subproblem. Meanwhile, the convergence of our proposed framework to global optimum is theoretically provable. To further accelerate the proposed framework, a sub-linear time complexity hashing strategy, e.g. Locality-Sensitive Hashing, is seamlessly integrated into our framework. Extensive empirical experiments on NUS-WIDE and IMAGENET datasets demonstrate that the orders-of-magnitude acceleration is achieved by the proposed framework for large-scale image annotation, along with zero/negligible accuracy loss for the cases without/with hashing speed-up, compared to the expensive off-the-shelf solvers.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89860240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion matters: a novel framework for compressing surveillance videos","authors":"Xiaojie Guo, Siyuan Li, Xiaochun Cao","doi":"10.1145/2502081.2502145","DOIUrl":"https://doi.org/10.1145/2502081.2502145","url":null,"abstract":"Currently, video surveillance plays a very important role in the fields of public safety and security. For storing the videos that usually contain extremely long sequences, it requires huge space. Video compression techniques can be used to release the storage load to some extent, such as H.264/AVC. However, the existing codecs are not sufficiently effective and efficient for encoding surveillance videos as they do not specifically consider the characteristic of surveillance videos, i.e. the background of surveillance video has intensive redundancy. This paper introduces a novel framework for compressing such videos. We first train a background dictionary based on a small number of observed frames. With the trained background dictionary, we then separate every frame into the background and motion (foreground), and store the compressed motion together with the reconstruction coefficient of the background corresponding to the background dictionary. The decoding is carried out on the encoded frame in an inverse procedure. The experimental results on extensive surveillance videos demonstrate that our proposed method significantly reduces the size of videos while gains much higher PSNR compared to the state of the art codecs.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89139035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social life networks: a multimedia problem?","authors":"Amarnath Gupta, R. Jain","doi":"10.1145/2502081.2502279","DOIUrl":"https://doi.org/10.1145/2502081.2502279","url":null,"abstract":"Connecting people to the resources they need is a fundamental task for any society. We present the idea of a technology that can be used by the middle tier of a society so that it uses people's mobile devices and social networks to connect the needy with providers. We conceive of a world observatory called the Social Life Network (SLN) that connects together people and things and monitors for people's needs as their life situations evolve. To enable such a system we need SLN to register and recognize situations by combining people's activities and data streaming from personal devices and environment sensors, and based on the situations make the connections when possible. But is this a multimedia problem? We show that many pattern recognition, machine learning, sensor fusion and information retrieval techniques used in multimedia-related research are deeply connected to the SLN problem. We sketch the functional architecture of such a system and show the place for these techniques.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90807974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}