{"title":"Macro-Block-Level Selective Background Difference Coding for Surveillance Video","authors":"Xianguo Zhang, Yonghong Tian, Luhong Liang, Tiejun Huang, Wen Gao","doi":"10.1109/ICME.2012.136","DOIUrl":"https://doi.org/10.1109/ICME.2012.136","url":null,"abstract":"Utilizing the special properties to improve the surveillance video coding efficiency still has much room, although there have been three typical paradigms of methods: object-oriented, background-prediction-based and background-difference-based methods. However, due to the inaccurate foreground segmentation, the low-quality or unclear background frame, and the potential \"foreground pollution\" phenomenon, there is still much room for improvement. To address this problem, this paper proposes a macro-block-level selective background difference coding method (MSBDC). MSBDC selects the following two ways to encode each macro-block (MB): coding the original MB, and directly coding the difference data between the MB and its corresponding background. MSBDC also features at employs the classification of MBs to facilitate the selection, through which, prediction and motion compensation turns more accurate, both on foreground and background. Results show that, MSBDC significantly decreases the total bitrate and obtains a remarkable performance gain on foreground compared with several state-of-the-art methods.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deok-Yeon Kim, Joon-Young Kwak, ByoungChul Ko, J. Nam
{"title":"Human Detection Using Wavelet-Based CS-LBP and a Cascade of Random Forests","authors":"Deok-Yeon Kim, Joon-Young Kwak, ByoungChul Ko, J. Nam","doi":"10.1109/ICME.2012.124","DOIUrl":"https://doi.org/10.1109/ICME.2012.124","url":null,"abstract":"In this paper, we propose a novel human detection approach combining wavelet-based center symmetric LBP (WCS-LBP) with a cascade of random forests. To detect human regions, we first extract three types of WCS-LBP features from a scanning window of wavelet transformed sub-images to reduce the feature dimension. Then, the extracted WCS-LBP descriptors are applied to a cascade of random forests, which are ensembles of random decision trees. Using a cascade of random forests with WCS-LBP, human detection is performed in near real-time, and the detection accuracy is also increased, as compared to combinations of other features and classifiers. The proposed algorithm is successfully applied to various human and non-human images from the INRIA dataset, and it performs better than other related algorithms.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127336325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lumipen: Projection-Based Mixed Reality for Dynamic Objects","authors":"Kohei Okumura, H. Oku, M. Ishikawa","doi":"10.1109/ICME.2012.34","DOIUrl":"https://doi.org/10.1109/ICME.2012.34","url":null,"abstract":"Recently, mixed reality (MR) involving the use of a projector, which is often called projection-based MR, has attracted much attention and is being widely studied. However, most MR research is mainly focused on static objects or situations. In contrast, we propose \"Lumipen,\" which is a projection-based MR system for high-speed or high-frequency objects, using a high-speed vision sensor and a projector with a high-speed optical gaze controller. The high-speed vision sensor detects the dynamics of the object, and the high-speed optical gaze controller controls the projection direction in a few milliseconds. Lumipen is expected to enable addition of sophisticated and diverse visual information even in dynamic live content (e.g., a play, a sports game, a music concert, etc.), like computer graphics added to video content.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116826809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LIPS: A Lightweight Inter-layer Protection Scheme for Scalable Video Coding","authors":"Shih-Ying Chang, Hsin-Ta Chiao","doi":"10.1109/ICME.2012.33","DOIUrl":"https://doi.org/10.1109/ICME.2012.33","url":null,"abstract":"Scalable Video Coding (SVC) enables the partition of video data into layers of different priorities for adapting to both the diversities of terminal capabilities and the variation of network transmission mediums. It is more desirable that more important video data can be recovered with higher probabilities. In this paper, we propose a lightweight inter-layer protection scheme for SVC transmission, which enables the data in a higher layer (e.g., an enhancement layer) to enhance the data reconstruction in a lower layer (e.g., a base layer) in the receiver side. In contrast to non-inter-layer protection schemes, the simulation result in a DVB-H broadcast channel shows that the scheme exhibits 1% to 45% enhancement on the reconstruction of the SVC video data in the base layer.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116928693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group Recommendation Using External Followee for Social TV","authors":"Xiaoyan Wang, Lifeng Sun, Zhi Wang, Da Meng","doi":"10.1109/ICME.2012.122","DOIUrl":"https://doi.org/10.1109/ICME.2012.122","url":null,"abstract":"Group recommendation plays a significant role in Social TV systems, where online friends form into temporary groups to enjoy watching video together and interact with each other. Online microblogging systems introduce the \"following\" relationship that reflects the common interests between users in a group and external representative followees outside the group. Traditional group recommendation only considers internal group members' preferences and their relationship. In our study, we measure the external followees' impact on group interest and establish group preference model based on external experts' guidance for group recommendation. In addition, we take advantage of the current watching video to improve context-aware recommendations. Experimental results show that our solution works much better in situations of high group dynamic and inactive group members than traditional approaches.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131177877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grouplet-Based Distance Metric Learning for Video Concept Detection","authors":"Wei Jiang, A. Loui","doi":"10.1109/ICME.2012.123","DOIUrl":"https://doi.org/10.1109/ICME.2012.123","url":null,"abstract":"We investigate general concept detection in unconstrained videos. A distance metric learning algorithm is developed to use the information of the group let structure for improved detection. A group let is defined as a set of audio and/or visual code words that are grouped together according to their strong correlations in videos. By using the entire group lets as building elements, concepts can be more robustly detected than using discrete audio or visual code words. Compared with the traditional method of generating aggregated group let-based features for classification, our group let-based distance metric learning approach directly learns distances between data points, which better preserves the group let structure. Specifically, our algorithm uses an iterative quadratic programming formulation where the optimal distance metric can be effectively learned based on the large-margin nearest-neighbor setting. The framework is quite flexible, where various types of distances can be computed using individual group lets, and through the same distance metric learning algorithm the distances computed over individual group lets can be combined for final classification. We extensively evaluate our method over the large-scale Columbia Consumer Video set. Experiments demonstrate that our approach can achieve consistent and significant performance improvements.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132390661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Dynamic Scheduling Scheme for H.264/AVC Decoding on Multicore Architecture","authors":"Dung Vu, Jilong Kuang, L. Bhuyan","doi":"10.1109/ICME.2012.9","DOIUrl":"https://doi.org/10.1109/ICME.2012.9","url":null,"abstract":"Parallelizing H.264/AVC decoding on multicore architectures is challenged by its inherent structural and functional dependencies at both frame and macro-block levels, as macro-blocks and certain frame types must be decoded in a sequential order. So far, dynamic scheduling scheme with recursive tail submit, as one of the best existing algorithms, provides a good throughput performance by exploiting macro-block level parallelism and mitigating global queue contention. Nevertheless, it fails to achieve an optimal performance due to 1) the use of global queue, which incurs substantial synchronization overhead when the number of cores increases and 2) the unawareness of cache locality with respect to the underlying hierarchical core/cache topology that results in unnecessary latency, communication cost and load imbalance. In this paper, we propose an adaptive dynamic scheduling scheme that employs multiple local queues to reduce lock contention, and assigns tasks in a cache locality aware and load-balancing fashion so that neighboring macro-blocks are preferably dispatched to nearby cores. We design, implement and evaluate our scheme on a 32-core cc-NUMA SGI server. Compared to existing alternatives by running real benchmark applications, we observe that our scheme produces higher throughput and lower latency with more balanced workload and less communication cost.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134286065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Curvelet and Wavelet Texture Features for Content Based Image Retrieval","authors":"I. Sumana, Guojun Lu, Dengsheng Zhang","doi":"10.1109/ICME.2012.90","DOIUrl":"https://doi.org/10.1109/ICME.2012.90","url":null,"abstract":"Texture feature plays a vital role in content based Image retrieval (CBIR). Wavelet texture feature modeled by generalized Gaussian density (GGD) [1] performs better than discrete wavelet texture feature. Curve let texture feature was proposed in [2]. In this paper, we compute a new texture feature by applying the generalized Gaussian density to the distribution of curve let coefficients which we call curve let GGD texture feature. The purpose of this paper is to investigate curve let GGD texture feature and compare its retrieval performance with that of curve let, wavelet and wavelet GGD texture features. Experimental results show that both curve let and curve let GGD features perform significantly better than wavelet and wavelet GGD texture features. Among the two types of curve let based features, curve let feature shows better performance in CBIR than curve let GGD texture feature. The findings are discussed in the paper.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134371735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanjie Li, Tianfan Xue, Lifeng Sun, Jianzhuang Liu
{"title":"Joint Example-Based Depth Map Super-Resolution","authors":"Yanjie Li, Tianfan Xue, Lifeng Sun, Jianzhuang Liu","doi":"10.1109/ICME.2012.30","DOIUrl":"https://doi.org/10.1109/ICME.2012.30","url":null,"abstract":"The fast development of time-of-flight (ToF) cameras in recent years enables capture of high frame-rate 3D depth maps of moving objects. However, the resolution of depth map captured by ToF is rather limited, and thus it cannot be directly used to build a high quality 3D model. In order to handle this problem, we propose a novel joint example-based depth map super-resolution method, which converts a low resolution depth map to a high resolution depth map, using a registered high resolution color image as a reference. Different from previous depth map SR methods without training stage, we learn a mapping function from a set of training samples and enhance the resolution of the depth map via sparse coding algorithm. We further use a reconstruction constraint to make object edges sharper. Experimental results show that our method outperforms state-of-the-art methods for depth map super-resolution.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133020389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"View Independent Computer Lip-Reading","authors":"Yuxuan Lan, B. Theobald, R. Harvey","doi":"10.1109/ICME.2012.192","DOIUrl":"https://doi.org/10.1109/ICME.2012.192","url":null,"abstract":"Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"29 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113977302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}