{"title":"Spatio-temporal video error concealment with perceptually optimized mode selection","authors":"S. Belfiore, Marco Grangetto, E. Magli, G. Olmo","doi":"10.1109/ICASSP.2003.1200079","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200079","url":null,"abstract":"We propose a spatio-temporal error concealment algorithm for video transmission in an error-prone environment. The proposed technique employs motion vector estimation, edge-preserving interpolation, and texture analysis/synthesis. It has two main advantages with respect to existing methods, namely: (i) it aims at optimizing the visual quality of the restored video, and not only PSNR; and (ii) it employs an automatic mode selection algorithm in order to decide, on a macroblock basis, whether to use the spatial restoration, the temporal one, or a combination thereof. The algorithm has been applied to H.26L video, providing satisfactory performance over a large set of operating conditions.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121782568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A perceptually significant block-edge impairment metric for digital video coding","authors":"S. Suthaharan","doi":"10.1109/ICASSP.2003.1199566","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199566","url":null,"abstract":"A new perceptually significant block-edge impairment metric (PS-BIM) is presented as a quantitative distortion measure to evaluate blocking artifacts in block-based video coding. This distortion measure does not require the original video sequence as a comparative reference and is found to be consistent with subjective evaluation.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114231025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Buffer-constrained R-D optimized rate control for video coding","authors":"Lifeng Zhao, C.-C. Jay Kuo","doi":"10.1109/ICASSP.2003.1199114","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199114","url":null,"abstract":"Buffer-constrained R-D optimized rate control for video coding is investigated in this work. A frame level bit allocation is first presented based on a model of the relationship between the rate (R) and nonzero (NZ) coefficients. With the modelled R-NZ relationship, a quality feedback scheme is proposed to generate VBV (video buffer verifier) compliant bitstream with assured video quality. Then, a R-D optimized macroblock level rate control is described by jointly selecting the quantization parameter and the coding mode of macroblocks in I, B and P pictures for both progressive and interlaced video. To avoid the irregularly large MV or one single isolated coefficient, we extend the set of coding modes of MB by including zero MV and zero texture bits as two more candidates. Finally, fast heuristics are developed to reduce the computational complexity of R-D data generation and the Viterbi algorithm (VA) in R-D optimization, which achieves coding results close to the optimal one at a much lower computational cost.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124955659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A progressive to lossless embedded audio coder (PLEAC) with reversible modulated lapped transform","authors":"Jin Li","doi":"10.1109/ICASSP.2003.1199994","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199994","url":null,"abstract":"A progressive to lossless embedded audio coder (PLEAC) has been proposed. PLEAC is based purely on a reversible transform, which is designed to mimic the non-reversible transform in a normal psychoacoustic audio coder as much as possible. Coupled with a high performance embedded entropy codec, this empowers PLEAC with both lossless capability and fine granular scalability. The PLEAC encoder generates a bitstream that if fully decoded, completely recovers the original audio waveform without loss. Moreover, it is possible to scale this bitstream in a very large bitrate range, with granularity down to a single byte. Extensive experimental results support the superior lossless performance and bitstream scalability of the PLEAC coder.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint audio-video processing for biometric speaker identification","authors":"A. Kanak, E. Erzin, Y. Yemez, A. Tekalp","doi":"10.1109/ICASSP.2003.1202376","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202376","url":null,"abstract":"We present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted from each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with Mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are included to demonstrate the system performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127837038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear separation of signature trajectories for on-line personal authentication","authors":"M. Kondo, D. Muramatsu, M. Sasaki, T. Matsumoto","doi":"10.1109/ICASSP.2003.1202367","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202367","url":null,"abstract":"Authentication of individuals is rapidly becoming an important issue. The paper proposes a new nonlinear algorithm for pen-input on-line signature verification incorporating pen-position, pen-pressure and pen-inclination trajectories. A preliminary experiment was performed on a database consisting of 1849 genuine signatures and 3174 skilled forgery signatures from fourteen individuals. False acceptance rates and false rejection rates of less than 2% were obtained. Since no fine tuning was done, this preliminary result looks very promising.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhizhong Zhe, H. Wu, Zhenghua Yu, T. Ferguson, D. Tan
{"title":"Performance evaluation of a perceptual ringing distortion metric for digital video","authors":"Zhizhong Zhe, H. Wu, Zhenghua Yu, T. Ferguson, D. Tan","doi":"10.1109/ICASSP.2003.1199549","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199549","url":null,"abstract":"This paper evaluates a perceptual impairment measure for ringing artifacts, which are common in hybrid MC/DPCM/DCT coded video, as a predictor of the mean opinion score (MOS) obtained in the standard subjective assessment. The perceptual ringing artifacts measure is based on a vision model and a ringing distortion region segmentation algorithm, which is converted into a new perceptual ringing distortion metric (PRDM) on a scale of 0 to 5. This scale corresponds to a modified double-stimulus impairment scale variant II (DSIS-II) method. The Pearson correlation, the Spearman rank order correlation and the average absolute error are used to evaluate the performance of the PRDM compared with the subjective test data. The results show a strong correlation between the PRDM and the MOS with respect to ringing artifacts.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132263332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi moving people detection from binocular sequences","authors":"Yang Ran, Q. Zheng","doi":"10.1109/ICASSP.2003.1199101","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199101","url":null,"abstract":"A novel approach for detection of multiple moving objects from binocular video sequences is reported. First an efficient motion estimation method is applied to sequences acquired from each camera. The motion estimation is then used to obtain cross camera correspondence between the stereo pair. Next, background subtraction is achieved by fusion of temporal difference and depth estimation. Finally moving foregrounds are further segmented into moving object according to a distance measure defined in a 2.5D feature space, which is done in a hierarchical strategy. The proposed approach has been tested on several indoor and outdoor sequences. Preliminary experiments have shown that the new approach can robustly detect multiple partially occluded moving persons in a noisy background. Representative human detection results are presented.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123720417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Invariant content-based image retrieval by wavelet energy signatures","authors":"Chi-Man Pun","doi":"10.1109/ICASSP.2003.1199537","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199537","url":null,"abstract":"An effective rotation and scale invariant log-polar wavelet texture feature for image retrieval was proposed. The feature extraction process involves a log-polar transform followed by an adaptive row shift invariant wavelet packet transform. The log-polar transform converts a given image into a rotation and scale invariant but row-shifted image, which is then passed to the adaptive row shift invariant wavelet packet transform to generate adaptively some subbands of rotation and scale invariant wavelet coefficients with respect to an information cost function. An energy signature is computed for each subband of these wavelet coefficients. In order to reduce feature dimensionality, only the most dominant log-polar wavelet energy signatures are selected as feature vector for image retrieval. The whole feature extraction process is quite efficient and involves only O(n/spl middot/log n) complexity. Experimental results show that this rotation and scale invariant texture feature is effective and outperforms the traditional wavelet packet signatures.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121019094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraint satisfaction model for enhancement of evidence in recognition of consonant-vowel utterances","authors":"S. Gangashetty, C. Sekhar, B. Yegnanarayana","doi":"10.1109/ICASSP.2003.1202476","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202476","url":null,"abstract":"We address the issues in recognition of a large number of subword units of speech with high confusability among several units. Evidence available from the classification models trained with a limited number of training examples may not be strong to correctly recognize the subword units. We present a constraint satisfaction neural network model that can be used to enhance the evidence for a particular unit with the supporting evidence available for a subset of units confusable with that unit. We demonstrate the enhancement of evidence by the proposed model in recognition of utterances of 145 consonant-vowel units.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115437559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}