{"title":"Visual query attributes suggestion","authors":"Jingwen Bian, Zhengjun Zha, Hanwang Zhang, Q. Tian, Tat-Seng Chua","doi":"10.1145/2393347.2396334","DOIUrl":"https://doi.org/10.1145/2393347.2396334","url":null,"abstract":"Query suggestion is an effective solution to help users deliver their search intent. While many query suggestion approaches have been proposed for test-based image retrieval with query-by-keywords, query suggestion for content-based image retrieval (CBIR) with query-by-example (QBE) has been seldom studied. QBE usually suffers from the \"intention gap\" problem, especially when the user fails to get an appropriate query image to express his search intention precisely. In this paper, we propose a novel query suggestion scheme named Visual Query Attributes Suggestion (VQAS) for image search with QBE. Given a query image, informative attributes are suggested to the user as complements to the query. These attributes reflect the visual properties and key components of the query. By selecting some suggested attributes, the user can provide more precise search intent which is not captured by the query image. The evaluation results on two real-world image datasets show the effectiveness of VQAS in terms of retrieval performance and the quality of query suggestions.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Full paper session 7: visual search","authors":"Q. Tian","doi":"10.1145/3246400","DOIUrl":"https://doi.org/10.1145/3246400","url":null,"abstract":"","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128217064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scenario-driven interactive panorama video delivery: promptly watch and share enjoyable parts of an event","authors":"D. Ochi, H. Kimata, H. Noto, Akira Kojima","doi":"10.1145/2393347.2396456","DOIUrl":"https://doi.org/10.1145/2393347.2396456","url":null,"abstract":"We propose a scenario-driven interactive panorama video delivery system that allows users to repeatedly watch the enjoyable parts of an event and share them with others. It provides functions for interactively watching panorama video parts, as well as for scenario making (with user-selected camera trajectory), and delivery that allows users to share their panorama watching experiences. In this technical demos, we demonstrate the system that can make a favorable and stable camera trajectories as a scenario from intuitive user manipulations of a tablet device in a format that can be easily shared with others.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"72 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131627159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov-based image forensics for photographic copying from printed picture","authors":"Jing Yin, Yanmei Fang","doi":"10.1145/2393347.2396396","DOIUrl":"https://doi.org/10.1145/2393347.2396396","url":null,"abstract":"Nowadays, photographic-copying technique is very popular along with the rapid development of the image-capturing device, especially digital camera. As a result, the recaptured images, i.e., images taken from real-scene images displayed on various medium, e.g., LCD screen, are used in illegal cases now and then. In this paper, by comparing the recaptured images with their corresponding real-scene images, we find the recapturing procedure changes the statistics of the images. Then the Markov process based features extracted from the Discrete Cosine Transform(DCT) coefficients array are proposed to characterize this changes. During experimentation, a large and typical image dataset, which consisted of 3994 real-scene images and 3994 recaptured images that are taken from printed pictures with diversified image contents and camera models, is build and used for training and testing the classifier Support Vector Machine(SVM). Experimental results show that the proposed forensics scheme performs very well and outperforms the state-of-art methods.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"17 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131858920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jetway: minimizing costs on inter-datacenter video traffic","authors":"Yuan Feng, Baochun Li, Bo Li","doi":"10.1145/2393347.2393388","DOIUrl":"https://doi.org/10.1145/2393347.2393388","url":null,"abstract":"It is typical for video streaming service providers (such as NetFlix) to rely on services from cloud providers (such as Amazon), in order to build a scalable video streaming platform with high availability. The trend is largely driven by the fact that cloud providers deploy a number of datacenters inter-connected by high-capacity links, spanning different geographical regions. Video traffic across datacenters, such as video replication and transit server-to-customer video serving, constitutes a large portion of a cloud provider's inter-datacenter traffic. Charged by ISPs, such inter-datacenter video traffic incurs substantial operational costs to a cloud provider. In this paper, we argue that costs incurred by such inter-datacenter video traffic can be reduced or even minimized by carefully choosing paths, and by assigning flow rates on each inter-datacenter link along every path. We present Jetway, a new set of algorithms designed to minimize cloud providers' operational costs on inter-datacenter video traffic, by optimally routing video flows in an online fashion. Algorithms in Jetway are designed by following a methodical approach based on an in-depth theoretical analysis. As a highlight of this paper, we have built a real-world system framework to implement and deploy Jetway in the Amazon EC2 datacenters. With both simulations and real-world experiments using our implementation, we show that Jetway effectively helps transmitting videos across datacenters with reduced costs to cloud providers and satisfactory real-world performance.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134099540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable mining of small visual objects","authors":"Pierre Letessier, Olivier Buisson, A. Joly","doi":"10.1145/2393347.2393431","DOIUrl":"https://doi.org/10.1145/2393347.2393431","url":null,"abstract":"This paper presents a scalable method for automatically discovering frequent visual objects in large multimedia collections even if their size is very small. It first formally revisits the problem of mining or discovering such objects, and then generalizes two kinds of existing methods for probing candidate object seeds: weighted adaptive sampling and hashing-based methods. The idea is that the collision frequencies obtained with hashing-based methods can actually be converted into a prior probability density function given as input to a weighted adaptive sampling algorithm. This allows for an evaluation of any hashing scheme effectiveness in a more generalized way, and a comparison with other priors, e.g. guided by visual saliency concerns. We then introduce a new hashing strategy, working first at the visual level, and then at the geometric level. This strategy allows us to integrate weak geometric constraints into the hashing phase itself and not only neighborhood constraints as in previous works. Experiments conducted on a new dataset introduced in this paper will show that using this new hashing-based prior allows a drastic reduction of the number of tentative probes required to discover small objects instantiated several times in a large dataset.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123886004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human action recognition and retrieval using sole depth information","authors":"Yan-Ching Lin, Min-Chun Hu, Wen-Huang Cheng, Yung-Huan Hsieh, Hong-Ming Chen","doi":"10.1145/2393347.2396381","DOIUrl":"https://doi.org/10.1145/2393347.2396381","url":null,"abstract":"Observing the widespread use of Kinect-like depth cameras, in this work, we investigate into the problem of using sole depth data for human action recognition and retrieval in videos. We proposed the use of simple depth descriptors without learning optimization to achieve promising performances as compatible to those of the leading methods based on color images and videos, and can be effectively applied for real-time applications. Because of the infrared nature of depth cameras, the proposed approach will be especially useful under poor lighting conditions, e.g. the surveillance environments without sufficient lighting. Meanwhile, we proposed a large Depth-included Human Action video dataset, namely DHA, which contains 357 videos of performed human actions belonging to 17 categories. To the best of our knowledge, the DHA is one of the largest depth-included video datasets of human actions.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124052296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memorable basis: towards human-centralized sparse representation","authors":"Xiaoshuai Sun, H. Yao","doi":"10.1145/2393347.2396306","DOIUrl":"https://doi.org/10.1145/2393347.2396306","url":null,"abstract":"Previous studies of sparse representation in multimedia research focus on developing reliable and efficient dictionary learning algorithms. Despite the sparse prior, how to integrate other related perceptual factors of human being into dictionary learning process was seldom studied. In this paper, we investigate the influence of image memorability for human-centralized sparse representation. Based on the results of a photo memory game, we are able to quantitatively characterize an image's memorability which allows us to train sparse bases from the most memorable images instead of randomly selected natural images. We believed that such kind of basis is more consistent with neural networks in human brain and hence can better predict where human looks. To test our hypothesis, we choose human eye-fixation prediction problem for quantitative evaluation. The experimental results demonstrate the superior performance of our Memorable Basis compared to traditional sparse basis trained from unselected images.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Propagation-based social-aware replication for social video contents","authors":"Zhi Wang, Lifeng Sun, Xiangwen Chen, Wenwu Zhu, Jiangchuan Liu, Minghua Chen, Shiqiang Yang","doi":"10.1145/2393347.2393359","DOIUrl":"https://doi.org/10.1145/2393347.2393359","url":null,"abstract":"Online social network has reshaped the way how video contents are generated, distributed and consumed on today's Internet. Given the massive number of videos generated and shared in online social networks, it has been popular for users to directly access video contents in their preferred social network services. It is intriguing to study the service provision of social video contents for global users with satisfactory quality-of-experience. In this paper, we conduct large-scale measurement of a real-world online social network system to study the propagation of the social video contents. We have summarized important characteristics from the video propagation patterns, including social locality, geographical locality and temporal locality. Motivated by the measurement insights, we propose a propagation-based social-aware replication framework using a hybrid edge-cloud and peer-assisted architecture, namely PSAR, to serve the social video contents. Our replication strategies in PSAR are based on the design of three propagation-based replication indices, including a geographic influence index and a content propagation index to guide how the edge-cloud servers backup the videos, and a social influence index to guide how peers cache the videos for their friends. By incorporating these replication indices into our system design, PSAR has significantly improved the replication performance and the video service quality. Our trace-driven experiments further demonstrate the effectiveness and superiority of PSAR, which improves the local download ratio in the edge-cloud replication by 30%, and the local cache hit ratio in the peer-assisted replication by 40%, against traditional approaches.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"83 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127971252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parsing collective behaviors by hierarchical model with varying structure","authors":"Cong Zhang, Xiaokang Yang, Jun Zhu, Weiyao Lin","doi":"10.1145/2393347.2396389","DOIUrl":"https://doi.org/10.1145/2393347.2396389","url":null,"abstract":"Collective behaviors are usually composed of several groups. Considering the interactions among groups, this paper presents a novel framework to parse collective behaviors for video surveillance applications. We first propose a latent hierarchical model (LHM) with varying structure to represent the behavior with multiple groups. Furthermore, we also propose a multi-layer-based (MLB) inference method, where a sample-based heuristic search (SHS) is introduced to infer the group affiliation. And latent SVM is adopted to learn our model. With the proposed LHM, not only are the collective behaviors detected effectively, but also the group affiliation in the collective behaviors is figured out. Experiment results demonstrate the effectiveness of the proposed framework.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128654727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}