{"title":"Semantics-Based Video Indexing using a Stochastic Modeling Approach","authors":"Yong Wei, S. Bhandarkar, Kang Li","doi":"10.1109/ICIP.2007.4380017","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4380017","url":null,"abstract":"Semantic video indexing is the first step towards automatic video retrieval and personalization. We propose a data-driven stochastic modeling approach to perform both video segmentation and video indexing in a single pass. Compared with the existing hidden Markov model (HMM)-based video segmentation and indexing techniques, the advantages of the proposed approach are as follows: (1) the probabilistic grammar defining the video program is generated entirely from the training data allowing the proposed approach to handle various kinds of videos without having to manually redefine the program model; (2) the proposed use of the Tamura features improves the accuracy of temporal segmentation and indexing; (3) the need to use an HMM to model the video edit effects is obviated thus simplifying the processing and collection of training data and ensuring that all video segments in the database are labeled with concepts that have clear semantic meanings in order to facilitate semantics-based video retrieval. Experimental results on broadcast news video are presented.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"356 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124498671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junsong Yuan, Zhu Li, Yun Fu, Ying Wu, Thomas S. Huang
{"title":"Common Spatial Pattern Discovery by Efficient Candidate Pruning","authors":"Junsong Yuan, Zhu Li, Yun Fu, Ying Wu, Thomas S. Huang","doi":"10.1109/ICIP.2007.4378917","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4378917","url":null,"abstract":"Automatically discovering common visual patterns in images is very challenging due to the uncertainties in the visual appearances of such spatial patterns and the enormous computational cost involved in exploring the huge solution space. Instead of performing exhaustive search on all possible candidates of such spatial patterns at various locations and scales, this paper presents a novel and very efficient algorithm for discovering common visual patterns by designing a provably correct and computationally efficient pruning procedure that has a quadratic complexity. This new approach is able to efficiently search a set of images for unknown visual patterns that exhibit large appearance variations because of rotation, scale changes, slight view changes, color variations and partial occlusions.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126243001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Wavelet-Based Noise-Aware Method for Fusing Noisy Imagery","authors":"Xiaohui Yuan, B. Buckles","doi":"10.1109/ICIP.2007.4379602","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379602","url":null,"abstract":"Fusion of images in the presence of noise is a challenging problem. Conventional fusion methods focus on aggregating prominent image features, which usually result in noise enhancement. To address this problem, we developed a wavelet-based, noise-aware fusion method that distinguishes signal and noise coefficients on-the-fly and fuses them with weighted averaging and majority voting respectively. Our method retains coefficients that reconstruct salient features, whereas noise components are discarded. The performance is evaluated in terms of noise removal and feature retention. The comparisons with five state-of-the-art fusion methods and a combination with denoising method demonstrated that our method significantly outperformed the existing techniques with noisy inputs.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimised Compression Strategy in Wavelet-Based Video Coding using Improved Context Models","authors":"Toni Zgaljic, M. Mrak, E. Izquierdo","doi":"10.1109/ICIP.2007.4379331","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379331","url":null,"abstract":"Accurate probability estimation is a key to efficient compression in entropy coding phase of state-of-the-art video coding systems. Probability estimation can be enhanced if contexts in which symbols occur are used during the probability estimation phase. However, these contexts have to be carefully designed in order to avoid negative effects. Methods that use tree structures to model contexts of various syntax elements have been proven efficient in image and video coding. In this paper we use such structure to build optimised contexts for application in scalable wavelet-based video coding. With the proposed approach context are designed separately for intra-coded frames and motion-compensated frames considering varying statistics across different spatio-temporal subbands. Moreover, contexts are separately designed for different bit-planes. Comparison with compression using fixed contexts from embedded ZeroBlock coding (EZBC) has been performed showing improvements when context modelling on tree structures is applied.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126298439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subspace Extension to Phase Correlation Approach for Fast Image Registration","authors":"Jinchang Ren, T. Vlachos, Jianmin Jiang","doi":"10.1109/ICIP.2007.4378996","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4378996","url":null,"abstract":"A novel extension of phase correlation to subspace correlation is proposed, in which 2-D translation is decomposed into two 1-D motions thus only 1-D Fourier transform is used to estimate the corresponding motion. In each subspace, the first two highest peaks from 1-D correlation are linearly interpolated for subpixel accuracy. Experimental results have shown both the robustness and accuracy of our method.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126459225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection and Recovery of Film Dirt for Archive Restoration Applications","authors":"Jinchang Ren, T. Vlachos","doi":"10.1109/ICIP.2007.4379943","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379943","url":null,"abstract":"A novel spatio-temporal method is proposed for film dirt detection and recovery. Firstly, a more reliable confidence measurement of dirt is extracted for color films. False alarms caused by motion are filtered using consistency checks among several measurements. Then, candidate dirt is detected by filtering and thresholding this confidence measurement. Finally, bi-directional local motion compensation and ML3Dex filtering are taken for the recovery of dirt pixels. Experiments on real data demonstrate the efficiency and effectiveness of our method in terms of both detection and recovery of dirt.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128151155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporally Consistent Gaussian Random Field for Video Semantic Analysis","authors":"Jinhui Tang, Xiansheng Hua, Tao Mei, Guo-Jun Qi, Shipeng Li, Xiuqing Wu","doi":"10.1109/ICIP.2007.4380070","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4380070","url":null,"abstract":"As a major family of semi-supervised learning, graph based semi-supervised learning methods have attracted lots of interests in the machine learning community as well as many application areas recently. However, for the application of video semantic annotation, these methods only consider the relations among samples in the feature space and neglect an intrinsic property of video data: the temporally adjacent video segments (e.g., shots) usually have similar semantic concept. In this paper, we adapt this temporal consistency property of video data into graph based semi-supervised learning and propose a novel method named temporally consistent Gaussian random field (TCGRF) to improve the annotation results. Experiments conducted on the TREC VID data set have demonstrated its effectiveness.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125766516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Resolution Enhancement using Wavelet Domain Hidden Markov Tree and Coefficient Sign Estimation","authors":"A. Temi̇zel","doi":"10.1109/ICIP.2007.4379845","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379845","url":null,"abstract":"Image resolution enhancement using wavelets is a relatively new subject and many new algorithms have been proposed recently. These algorithms assume that the low resolution image is the approximation subband of a higher resolution image and attempts to estimate the unknown detail coefficients to reconstruct a high resolution image. A subset of these recent approaches utilized probabilistic models to estimate these unknown coefficients. Particularly, hidden Markov tree (HMT) based methods using Gaussian mixture models have been shown to produce promising results. However, one drawback of these methods is that, as the Gaussian is symmetrical around zero, signs of the coefficients generated using this distribution function are inherently random, adversely affecting the resulting image quality. In this paper, we demonstrate that, sign information is an important element affecting the results and propose a method to estimate signs of these coefficients more accurately.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121616622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Implementation of a Real-Time Global Tone Mapping Processor for High Dynamic Range Video","authors":"Tsun-Hsien Wang, Wei-Su Wong, F. Chen, C. Chiu","doi":"10.1109/ICIP.2007.4379558","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379558","url":null,"abstract":"As the development in high dynamic range (HDR) video capture technologies, the bit-depth video encoding and decoding has become an interesting topic. In this paper, we show that the real-time HDR video display is possible. A tone mapping based HDR video architecture pipelined with a video CODEC is presented. The HDR video is compressed by the tone mapping processor. The compressed HDR video can be encoded and decoded by the video standards, such as MPEG2, MPEG4 or H.264 for transmission and display. We propose and implement a modified photographic tone mapping algorithm for the tone mapping processor. The required luminance wordlength in the processor is analyzed and the quantization error is estimated. We also develop the digit-by-digit exponent and logarithm hardware architecture for the tone mapping processor. The synthesized results show that our real-time tone mapping processor can process a NTSC video with 720*480 resolution at 30 frames per second.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121725521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernels on Bags of Fuzzy Regions for Fast Object retrieval","authors":"P. Gosselin, M. Cord, S. Philipp-Foliguet","doi":"10.1109/ICIP.2007.4378920","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4378920","url":null,"abstract":"We propose in this paper a general kernel framework to deal with database object retrieval embedded in images with heterogeneous background. We use local features computed on fuzzy regions for image representation summarized in bags, and we propose original kernel functions to deal with sets of features and spatial constraints. Combined with SVMs classification and online learning scheme, the resulting algorithm satisfies the robustness requirements for representation and classification of objects. Experiments on a specific database having objects with heterogeneous backgrounds show the performance of our object retrieval technique.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115917418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}