{"title":"Leaf Shape Descriptor for Tree Species Identification","authors":"Itheri Yahiaoui, O. Mzoughi, N. Boujemaa","doi":"10.1109/ICME.2012.130","DOIUrl":"https://doi.org/10.1109/ICME.2012.130","url":null,"abstract":"The problem of automatic leaf identification is particularly challenging because, in addition to constraints derived from image processing such as geometric deformations (rotation, scale, translation) and illumination variations, it involves difficulties arising from foliar properties. These include two main aspects: the first is the enormous number and diversity of leaf species and the second, which is relevant to some special species, is the high inter-species and the low intra-species similarity. In this paper, we present a novel boundary-based approach that attempts to overcome the most of these constraints. This method has been compared to results obtained in the image CLEF 2011 plant identification task. The main advantage of this first benchmark edition is that different image retrieval techniques were tested and a crowd-sourced leaf dataset was used. Our method provides the best classification rate for scan and scan-like pictures. Besides its high accuracy, our method satisfies real-time requirements with a low computational cost.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low-Complexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering","authors":"Heming Sun, Dajiang Zhou, S. Goto","doi":"10.1109/ICME.2012.4","DOIUrl":"https://doi.org/10.1109/ICME.2012.4","url":null,"abstract":"HEVC achieves a better coding efficiency relative to prior standards, but also involves increased complexity. For intra prediction, complexity is especially intensive due to a highly flexible coding unit structure and a large number of prediction modes. This paper presents a low-complexity intra prediction algorithm for HEVC. A fast preprocessing stage based on a simplified cost model is proposed. Based on its results, a level filtering scheme reduces the number of prediction unit levels that requires fine processing from 5 to 2. To supply level filtering decision with appropriate thresholds, a fast training method is also designed. A mode filtering scheme further reduces the maximum number of angular modes to be evaluated from 34 to 9. Complexity reduction from HM 3.0 is over 50% and stable for various sequences, which makes the proposed algorithm suitable for real-time applications. The corresponding bit rate increase is lower than 2.5%.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128469672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Codella, A. Natsev, G. Hua, Matthew L. Hill, Liangliang Cao, L. Gong, John R. Smith
{"title":"Video Event Detection Using Temporal Pyramids of Visual Semantics with Kernel Optimization and Model Subspace Boosting","authors":"N. Codella, A. Natsev, G. Hua, Matthew L. Hill, Liangliang Cao, L. Gong, John R. Smith","doi":"10.1109/ICME.2012.190","DOIUrl":"https://doi.org/10.1109/ICME.2012.190","url":null,"abstract":"In this study, we present a system for video event classification that generates a temporal pyramid of static visual semantics using minimum-value, maximum-value, and average-value aggregation techniques. Kernel optimization and model subspace boosting are then applied to customize the pyramid for each event. SVM models are independently trained for each level in the pyramid using kernel selection according to 3-fold cross-validation. Kernels that both enforce static temporal order and permit temporal alignment are evaluated. Model subspace boosting is used to select the best combination of pyramid levels and aggregation techniques for each event. The NIST TRECVID Multimedia Event Detection (MED) 2011 dataset was used for evaluation. Results demonstrate that kernel optimizations using both temporally static and dynamic kernels together achieves better performance than any one particular method alone. In addition, model sub-space boosting reduces the size of the model by 80%, while maintaining 96% of the performance gain.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingchun Lu, Xiangzhong Fang, Chong Xu, Yongzhe Wang
{"title":"Frame Rate Up-Conversion for Depth-Based 3D Video","authors":"Qingchun Lu, Xiangzhong Fang, Chong Xu, Yongzhe Wang","doi":"10.1109/ICME.2012.117","DOIUrl":"https://doi.org/10.1109/ICME.2012.117","url":null,"abstract":"A novel frame rate up-conversion (FRUC) scheme for depth-based 3D video is proposed. Differing from the existing conventional FRUC methods which are designed for two dimensional (2D) video only, the proposed method is designed for depth-based 3D video, which increases the frame rate of both color sequences and its associated depth maps by jointly considering the image intensity, depth information, and spatial-temporal correlations. The proposed method contains motion estimation, block irregular segmentation (BIS), depth-constrained motion vector post-processing, forward and backward motion compensation, and edge-preserved combination. Experimental results show that the intermediate color and depth images interpolated by our proposed method provide a good image quality both objectively and subjectively. Moreover, compared with conversional FRUC methods, the color and depth map sequences up-converted by the proposed method are more suitable for further virtual view synthesis.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126529964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan
{"title":"Motion Based Perceptual Distortion and Rate Optimization for Video Coding","authors":"Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan","doi":"10.1109/ICME.2012.140","DOIUrl":"https://doi.org/10.1109/ICME.2012.140","url":null,"abstract":"Most conventional distortion metrics regard a video frame as a static image, and seldom exploit using the motion information of video frames in succession. Moreover, these methods usually calculate the visual distortion based on the independent spatial pixels. Recently, many researches show that the way people perceive the video signals is similar to the way filters process signals in the frequency domain. Therefore, in order to achieve better visual quality, we introduce a novel distortion measurement into the video coding system, which is consistent with human visual perception, and establish a perception-based rate-distortion optimization model. In this paper, we adopt Gabor filter family to decompose the video signals into frequency domain, and combine the video motion information to measure the perceptual distortion. We call it Motion tuned Distortion metric For Video coding (MDFV). After that we set up an MDFV based rate-distortion optimization model to select the best encoding mode. The experimental results show that the proposed approach is effective.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126705213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSIM-Inspired Perceptual Video Coding for HEVC","authors":"A. Rehman, Zhou Wang","doi":"10.1109/ICME.2012.175","DOIUrl":"https://doi.org/10.1109/ICME.2012.175","url":null,"abstract":"Recent advances in video capturing and display technologies, along with the exponentially increasing demand of video services, challenge the video coding research community to design new algorithms able to significantly improve the compression performance of the current H.264/AVC standard. This target is currently gaining evidence with the standardization activities in the High Efficiency Video Coding (HEVC) project. The distortion models used in HEVC are mean squared error (MSE) and sum of absolute difference (SAD). However, they are widely criticized for not correlating well with perceptual image quality. The structural similarity (SSIM) index has been found to be a good indicator of perceived image quality. Meanwhile, it is computationally simple compared with other state-of-the-art perceptual quality measures and has a number of desirable mathematical properties for optimization tasks. We propose a perceptual video coding method to improve upon the current HEVC based on an SSIM-inspired divisive normalization scheme as an attempt to transform the DCT domain frame prediction residuals to a perceptually uniform space before encoding. Based on the residual divisive normalization process, we define a distortion model for mode selection and show that such a divisive normalization strategy largely simplifies the subsequent perceptual rate-distortion optimization procedure. We further adjust the divisive normalization factors based on local content of the video frame. Experiments show that the proposed scheme can achieve significant gain in terms of rate-SSIM performance when compared with HEVC.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121642523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area and Memory Efficient Architectures for 3D Blu-ray-compliant Multimedia Processors","authors":"Chi-Cheng Ju, Tsu-Ming Liu, Y. Chu, Chuang-Chi Chiou, Bin-Jung Tsai, T. Hsiao, Ginny Chen, Pin-Huan Hsu, Chih-Ming Wang, Chun-Chia Chen, Hue-Min Lin, Chia-Yun Cheng, Min-Hao Chiu, Sheng-Jen Wang, Jiun-Yuan Wu, Yuan-Chun Lin, Yung-Chang Chang, Chung-Hung Tsai","doi":"10.1109/ICME.2012.81","DOIUrl":"https://doi.org/10.1109/ICME.2012.81","url":null,"abstract":"A 3D Blu-ray-compliant multimedia processor integrating video decoder, display and graphic engines is presented. To cope with the bandwidth/cost-starved Blu-ray system, this design exploits the time-sharing techniques, leading to 31.3% and 29.1% of area reduction in display and decoder parts. Moreover, a graphic and on-screen-display hardwired handshake effectively reduces the DRAM space by 40%. A smart graphic command removal eliminates the redundant memory accesses by 14%. For 3D Blu-ray playback requirements, stereo full-HD video decoding, 24Hz display, and stereoscopic graphic UI are realized under the frequency of 333MHz, 148MHz, and 333MHz, respectively. This test chip is fabricated in 40nm CMOS process with core area of 3.92mm2 and power dissipation of 124.1mW.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114248119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel View-Level Target Bit Rate Distribution Estimation Technique for Real-Time Multi-view Video Plus Depth","authors":"M. Cordina, C. J. Debono","doi":"10.1109/ICME.2012.5","DOIUrl":"https://doi.org/10.1109/ICME.2012.5","url":null,"abstract":"This paper presents a novel view-level target bit rate distribution estimation technique for real-time Multi-view video plus depth using a statistical model that is based on the prediction mode distribution. Experiments using various standard test sequences show the efficacy of the technique, as the model manages to estimate online the view-level target bit rate distribution with an absolute mean estimation error of 2% and a standard deviation of 0.9%. Moreover, this technique provides adaptation of the view-level bit rate distribution providing scene change handling capability.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123606180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colin Bailey, Mirghiasaldin Seyedebrahimi, Xiaohong Peng
{"title":"Pause Intensity: A No-Reference Quality Assessment Metric for Video Streaming in TCP Networks","authors":"Colin Bailey, Mirghiasaldin Seyedebrahimi, Xiaohong Peng","doi":"10.1109/ICME.2012.148","DOIUrl":"https://doi.org/10.1109/ICME.2012.148","url":null,"abstract":"In this paper a full analytic model for pause intensity (PI), a no-reference metric for video quality assessment, is presented. The model is built upon the video play out buffer behavior at the client side and also encompasses the characteristics of a TCP network. Video streaming via TCP produces impairments in play continuity, which are not typically reflected in current objective metrics such as PSNR and SSIM. Recently the buffer under run frequency/probability has been used to characterize the buffer behavior and as a measurement for performance optimization. But we show, using subjective testing, that under run frequency cannot reflect the viewers' quality of experience for TCP based streaming. We also demonstrate that PI is a comprehensive metric made up of a combination of phenomena observed in the play out buffer. The analytical model in this work is verified with simulations carried out on ns-2, showing that the two results are closely matched. The effectiveness of the PI metric has also been proved by subjective testing on a range of video clips, where PI values exhibit a good correlation with the viewers' opinion scores.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129734560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition","authors":"Xuran Zhao, N. Evans, J. Dugelay","doi":"10.1109/ICME.2012.14","DOIUrl":"https://doi.org/10.1109/ICME.2012.14","url":null,"abstract":"Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125313126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}