2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献

筛选
英文 中文
Spatio-temporal video error concealment with perceptually optimized mode selection 基于感知优化模式选择的时空视频错误隐藏
S. Belfiore, Marco Grangetto, E. Magli, G. Olmo
{"title":"Spatio-temporal video error concealment with perceptually optimized mode selection","authors":"S. Belfiore, Marco Grangetto, E. Magli, G. Olmo","doi":"10.1109/ICASSP.2003.1200079","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200079","url":null,"abstract":"We propose a spatio-temporal error concealment algorithm for video transmission in an error-prone environment. The proposed technique employs motion vector estimation, edge-preserving interpolation, and texture analysis/synthesis. It has two main advantages with respect to existing methods, namely: (i) it aims at optimizing the visual quality of the restored video, and not only PSNR; and (ii) it employs an automatic mode selection algorithm in order to decide, on a macroblock basis, whether to use the spatial restoration, the temporal one, or a combination thereof. The algorithm has been applied to H.26L video, providing satisfactory performance over a large set of operating conditions.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121782568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A perceptually significant block-edge impairment metric for digital video coding 用于数字视频编码的感知显著块边缘损伤度量
S. Suthaharan
{"title":"A perceptually significant block-edge impairment metric for digital video coding","authors":"S. Suthaharan","doi":"10.1109/ICASSP.2003.1199566","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199566","url":null,"abstract":"A new perceptually significant block-edge impairment metric (PS-BIM) is presented as a quantitative distortion measure to evaluate blocking artifacts in block-based video coding. This distortion measure does not require the original video sequence as a comparative reference and is found to be consistent with subjective evaluation.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114231025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Buffer-constrained R-D optimized rate control for video coding 缓冲约束的R-D优化视频编码速率控制
Lifeng Zhao, C.-C. Jay Kuo
{"title":"Buffer-constrained R-D optimized rate control for video coding","authors":"Lifeng Zhao, C.-C. Jay Kuo","doi":"10.1109/ICASSP.2003.1199114","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199114","url":null,"abstract":"Buffer-constrained R-D optimized rate control for video coding is investigated in this work. A frame level bit allocation is first presented based on a model of the relationship between the rate (R) and nonzero (NZ) coefficients. With the modelled R-NZ relationship, a quality feedback scheme is proposed to generate VBV (video buffer verifier) compliant bitstream with assured video quality. Then, a R-D optimized macroblock level rate control is described by jointly selecting the quantization parameter and the coding mode of macroblocks in I, B and P pictures for both progressive and interlaced video. To avoid the irregularly large MV or one single isolated coefficient, we extend the set of coding modes of MB by including zero MV and zero texture bits as two more candidates. Finally, fast heuristics are developed to reduce the computational complexity of R-D data generation and the Viterbi algorithm (VA) in R-D optimization, which achieves coding results close to the optimal one at a much lower computational cost.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124955659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A progressive to lossless embedded audio coder (PLEAC) with reversible modulated lapped transform 一种具有可逆调制重叠变换的渐进到无损的嵌入式音频编码器(PLEAC)
Jin Li
{"title":"A progressive to lossless embedded audio coder (PLEAC) with reversible modulated lapped transform","authors":"Jin Li","doi":"10.1109/ICASSP.2003.1199994","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199994","url":null,"abstract":"A progressive to lossless embedded audio coder (PLEAC) has been proposed. PLEAC is based purely on a reversible transform, which is designed to mimic the non-reversible transform in a normal psychoacoustic audio coder as much as possible. Coupled with a high performance embedded entropy codec, this empowers PLEAC with both lossless capability and fine granular scalability. The PLEAC encoder generates a bitstream that if fully decoded, completely recovers the original audio waveform without loss. Moreover, it is possible to scale this bitstream in a very large bitrate range, with granularity down to a single byte. Extensive experimental results support the superior lossless performance and bitstream scalability of the PLEAC coder.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Joint audio-video processing for biometric speaker identification 生物特征说话人识别的声视频联合处理
A. Kanak, E. Erzin, Y. Yemez, A. Tekalp
{"title":"Joint audio-video processing for biometric speaker identification","authors":"A. Kanak, E. Erzin, Y. Yemez, A. Tekalp","doi":"10.1109/ICASSP.2003.1202376","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202376","url":null,"abstract":"We present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted from each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with Mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are included to demonstrate the system performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127837038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Nonlinear separation of signature trajectories for on-line personal authentication 在线个人认证中签名轨迹的非线性分离
M. Kondo, D. Muramatsu, M. Sasaki, T. Matsumoto
{"title":"Nonlinear separation of signature trajectories for on-line personal authentication","authors":"M. Kondo, D. Muramatsu, M. Sasaki, T. Matsumoto","doi":"10.1109/ICASSP.2003.1202367","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202367","url":null,"abstract":"Authentication of individuals is rapidly becoming an important issue. The paper proposes a new nonlinear algorithm for pen-input on-line signature verification incorporating pen-position, pen-pressure and pen-inclination trajectories. A preliminary experiment was performed on a database consisting of 1849 genuine signatures and 3174 skilled forgery signatures from fourteen individuals. False acceptance rates and false rejection rates of less than 2% were obtained. Since no fine tuning was done, this preliminary result looks very promising.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance evaluation of a perceptual ringing distortion metric for digital video 一种用于数字视频的感知振铃失真度量的性能评价
Zhizhong Zhe, H. Wu, Zhenghua Yu, T. Ferguson, D. Tan
{"title":"Performance evaluation of a perceptual ringing distortion metric for digital video","authors":"Zhizhong Zhe, H. Wu, Zhenghua Yu, T. Ferguson, D. Tan","doi":"10.1109/ICASSP.2003.1199549","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199549","url":null,"abstract":"This paper evaluates a perceptual impairment measure for ringing artifacts, which are common in hybrid MC/DPCM/DCT coded video, as a predictor of the mean opinion score (MOS) obtained in the standard subjective assessment. The perceptual ringing artifacts measure is based on a vision model and a ringing distortion region segmentation algorithm, which is converted into a new perceptual ringing distortion metric (PRDM) on a scale of 0 to 5. This scale corresponds to a modified double-stimulus impairment scale variant II (DSIS-II) method. The Pearson correlation, the Spearman rank order correlation and the average absolute error are used to evaluate the performance of the PRDM compared with the subjective test data. The results show a strong correlation between the PRDM and the MOS with respect to ringing artifacts.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132263332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multi moving people detection from binocular sequences 基于双目序列的多人运动检测
Yang Ran, Q. Zheng
{"title":"Multi moving people detection from binocular sequences","authors":"Yang Ran, Q. Zheng","doi":"10.1109/ICASSP.2003.1199101","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199101","url":null,"abstract":"A novel approach for detection of multiple moving objects from binocular video sequences is reported. First an efficient motion estimation method is applied to sequences acquired from each camera. The motion estimation is then used to obtain cross camera correspondence between the stereo pair. Next, background subtraction is achieved by fusion of temporal difference and depth estimation. Finally moving foregrounds are further segmented into moving object according to a distance measure defined in a 2.5D feature space, which is done in a hierarchical strategy. The proposed approach has been tested on several indoor and outdoor sequences. Preliminary experiments have shown that the new approach can robustly detect multiple partially occluded moving persons in a noisy background. Representative human detection results are presented.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123720417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Invariant content-based image retrieval by wavelet energy signatures 基于小波能量特征的不变内容图像检索
Chi-Man Pun
{"title":"Invariant content-based image retrieval by wavelet energy signatures","authors":"Chi-Man Pun","doi":"10.1109/ICASSP.2003.1199537","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199537","url":null,"abstract":"An effective rotation and scale invariant log-polar wavelet texture feature for image retrieval was proposed. The feature extraction process involves a log-polar transform followed by an adaptive row shift invariant wavelet packet transform. The log-polar transform converts a given image into a rotation and scale invariant but row-shifted image, which is then passed to the adaptive row shift invariant wavelet packet transform to generate adaptively some subbands of rotation and scale invariant wavelet coefficients with respect to an information cost function. An energy signature is computed for each subband of these wavelet coefficients. In order to reduce feature dimensionality, only the most dominant log-polar wavelet energy signatures are selected as feature vector for image retrieval. The whole feature extraction process is quite efficient and involves only O(n/spl middot/log n) complexity. Experimental results show that this rotation and scale invariant texture feature is effective and outperforms the traditional wavelet packet signatures.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121019094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Constraint satisfaction model for enhancement of evidence in recognition of consonant-vowel utterances 增强声母语音识别证据的约束满足模型
S. Gangashetty, C. Sekhar, B. Yegnanarayana
{"title":"Constraint satisfaction model for enhancement of evidence in recognition of consonant-vowel utterances","authors":"S. Gangashetty, C. Sekhar, B. Yegnanarayana","doi":"10.1109/ICASSP.2003.1202476","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202476","url":null,"abstract":"We address the issues in recognition of a large number of subword units of speech with high confusability among several units. Evidence available from the classification models trained with a limited number of training examples may not be strong to correctly recognize the subword units. We present a constraint satisfaction neural network model that can be used to enhance the evidence for a particular unit with the supporting evidence available for a subset of units confusable with that unit. We demonstrate the enhancement of evidence by the proposed model in recognition of utterances of 145 consonant-vowel units.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115437559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信