2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献

筛选
英文 中文
Color image segmentation using density-based clustering 基于密度聚类的彩色图像分割
Qixiang Ye, Wen Gao, Wei Zeng
{"title":"Color image segmentation using density-based clustering","authors":"Qixiang Ye, Wen Gao, Wei Zeng","doi":"10.1109/ICASSP.2003.1199480","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199480","url":null,"abstract":"Color image segmentation is an important but still open problem in image processing. We propose a method for this problem by integrating the spatial connectivity and color features of the pixels. Considering that an image can be regarded as a dataset in which each pixel has a spatial location and a color value, color image segmentation can be obtained by clustering these pixels into different groups of coherent spatial connectivity and color. To discover the spatial connectivity of the pixels, density-based clustering is employed, which is an effective clustering method used in data mining for discovering spatial databases. The color similarity of the pixels is measured in Munsell (HVC) color space whose perceptual uniformity ensures the color change in the segmented regions is smooth in terms of human perception. Experimental results using the proposed method demonstrate encouraging performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131984795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
3D motion vector coding with block base adaptive interpolation filter on H.264 基于H.264的块基自适应插值滤波器的三维运动矢量编码
H. Kimata, Masaki Kitahara, Y. Yashima
{"title":"3D motion vector coding with block base adaptive interpolation filter on H.264","authors":"H. Kimata, Masaki Kitahara, Y. Yashima","doi":"10.1109/ICASSP.2003.1199554","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199554","url":null,"abstract":"Fractional pel motion compensation generally improves coding efficiency due to more precise motion accuracy and low path filtering effect in generating an image at fractional pel positions. In H.264, quarter pel motion compensation is applied, where the image at half pel position is generated by a 6 tap Wiener filter. And the adaptive interpolation filter technique, which adaptively changes filter characteristics for half pel positions has been proposed. That technique also changes the image at quarter pel positions, so it can be exploited to extend motion accuracy to be more precise. In this paper, a 3D motion vector coding (3DMVC) technique with block base adaptive interpolation filter (BAIF) is proposed. This paper also demonstrates the proposed method ensures filter data is successfully integrated into motion vector coding and outperforms the normal H.264.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"23 1 Suppl 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Frame-dependent multi-stream reliability indicators for audio-visual speech recognition 基于帧的多流音频语音识别可靠性指标
A. Garg, G. Potamianos, C. Neti, Thomas S. Huang
{"title":"Frame-dependent multi-stream reliability indicators for audio-visual speech recognition","authors":"A. Garg, G. Potamianos, C. Neti, Thomas S. Huang","doi":"10.1109/ICASSP.2003.1198707","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198707","url":null,"abstract":"We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116750239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A fusion scheme of visual and auditory modalities for event detection in sports video 一种用于运动视频事件检测的视觉和听觉模态融合方案
Mingliang Xu, Ling-yu Duan, Changsheng Xu, Q. Tian
{"title":"A fusion scheme of visual and auditory modalities for event detection in sports video","authors":"Mingliang Xu, Ling-yu Duan, Changsheng Xu, Q. Tian","doi":"10.1109/ICASSP.2003.1199139","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199139","url":null,"abstract":"We propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i.e. silence, hitting ball, applause, commentator speech) with the goal of detecting events with strong semantic meaning. For instance, for tennis video, we have identified five interesting events: serve, reserve, ace, return, and score. Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can be easily adapted to a new sports game. We are extending this fusion scheme to three additional typical sports videos: basketball, volleyball and soccer. Correctly detected sports video events will greatly facilitate further structural and temporal analysis, such as sports video skimming, table of content, etc.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"120 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching 通过参数化建模和基于合成分析的模式匹配提取音调特征
Jinfu Ni, H. Kawai
{"title":"Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching","authors":"Jinfu Ni, H. Kawai","doi":"10.1109/ICASSP.2003.1198719","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198719","url":null,"abstract":"A functional fundamental frequency (F/sub 0/) model is applied to extract tone peak and gliding features from Mandarin F/sub 0/ contours aiming at automatic prosodic labeling of a large scale speech corpus. Modeling four lexical tones and representing them in a parametric form based on the F/sub 0/ model, we first cluster baseline tone patterns using the LBG (Linde-Buzo-Gray) algorithm, then perform analysis-by-synthesis-based pattern matching to estimate underlying tone peaks and tone pattern types from observed F/sub 0/ contours and phonetic labels with lexical tones. Tone gliding features are re-estimated after the determination of tone peaks. 94% of the automatically estimated labels were consistent with the manual labels in an open test of 968 utterances from eight native speakers. Also, experimental results indicate that the proposed method is applicable for F/sub 0/ contour smoothing and tone verification.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"433 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120897430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Successive bit-plane rate allocation technique for JPEG2000 image coding JPEG2000图像编码的连续位平面速率分配技术
Y. M. Yeung, O. Au, A. Chang
{"title":"Successive bit-plane rate allocation technique for JPEG2000 image coding","authors":"Y. M. Yeung, O. Au, A. Chang","doi":"10.1109/ICASSP.2003.1199157","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199157","url":null,"abstract":"A novel rate control scheme using successive bit-plane rate allocation (SBRA) is proposed for JPEG2000 image coding. By using the current rate-distortion information only, the proposed method can achieve a quality close to the post-compression rate-distortion (PCRD) optimization scheme adopted in JPEG2000. The proposed scheme can efficiently reduce both the computational cost and working memory size of the entropy coding process up to about 90%, in the case of 0.25bpp (1/32) compression. Without using the future rate-distortion information, the sequential property of the proposed method is highly suitable for real-time (or low delay) applications and implementation.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122271506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
DBN based multi-stream models for speech 基于DBN的语音多流模型
Yimin Zhang, Q. Diao, Shan Huang, Wei Hu, C. Bartels, J. Bilmes
{"title":"DBN based multi-stream models for speech","authors":"Yimin Zhang, Q. Diao, Shan Huang, Wei Hu, C. Bartels, J. Bilmes","doi":"10.1109/ICASSP.2003.1198911","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198911","url":null,"abstract":"We propose dynamic Bayesian network (DBN) based synchronous and asynchronous multi-stream models for noise-robust automatic speech recognition. In these models, multiple noise-robust features are combined into a single DBN to obtain better performance than any single feature system alone. Results on the Aurora 2.0 noisy speech task show significant improvements of our synchronous model over both single stream models and over a ROVER based fusion method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A small sample model selection criterion based on Kullback's symmetric divergence 基于Kullback对称散度的小样本模型选择准则
A. Seghouane, M. Bekara, G. Fleury
{"title":"A small sample model selection criterion based on Kullback's symmetric divergence","authors":"A. Seghouane, M. Bekara, G. Fleury","doi":"10.1109/ICASSP.2003.1201639","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201639","url":null,"abstract":"The Kullback information criterion (KIC) is a recently developed tool for statistical model selection (Cavanaugh, J.E., Statistics and Probability Letters, vol.42, p.333-43, 1999). KIC serves as an asymptotically unbiased estimator of a variant of the Kullback symmetric divergence, known also as J-divergence. A bias correction of the Kullback symmetric information criterion is derived for linear models. The correction is of particular use when the sample size is small or when the number of fitted parameters is of a moderate to large fraction of the sample size. For linear regression models, the corrected method, called KICc, is an exactly unbiased estimator of a variant of the Kullback symmetric divergence between the true unknown model and the candidate fitted model. Furthermore, KICc is found to provide better model order choice than any other asymptotically efficient methods when applied to autoregressive time series models.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Co-channel speaker identification using usable speech extraction based on multi-pitch tracking 基于多音高跟踪的可用语音提取的同信道说话人识别
Yang Shao, Deliang Wang
{"title":"Co-channel speaker identification using usable speech extraction based on multi-pitch tracking","authors":"Yang Shao, Deliang Wang","doi":"10.1109/ICASSP.2003.1202330","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202330","url":null,"abstract":"Recently, usable speech criteria have been proposed to extract minimally corrupted speech for speaker identification (SID) in co-channel speech. In this paper, we propose a new usable speech extraction method to improve the SID performance under the co-channel situation based on the pitch information obtained from a robust multi-pitch tracking algorithm [2]. The idea is to retain the speech segments that have only one pitch detected and remove the others. The system is evaluated on co-channel speech and results show a significant improvement across various target to interferer ratios (TIR) for speaker identification.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Blind (training-like) decoder assisted beamforming for DS-CDMA systems DS-CDMA系统盲(类训练)解码器辅助波束形成
R. Pacheco, D. Hatzinakos
{"title":"Blind (training-like) decoder assisted beamforming for DS-CDMA systems","authors":"R. Pacheco, D. Hatzinakos","doi":"10.1109/ICASSP.2003.1202672","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202672","url":null,"abstract":"We propose an iterative blind beamforming strategy for short-burst high-rate DS-CDMA systems. The blind strategy works by creating a set of \"training sequences\" in the receiver that is used as input to a semi-blind beamforming algorithm, thus producing a corresponding set of beamformers. The objective then becomes to find which beamformer gives the best performance (smallest bit error). Two challenges we face are: (1) to find a semi-blind algorithm that requires very few training symbols (to minimize the search time); (2) to find an appropriate criterion for picking the beamformer that offers the best performance. Different semi-blind algorithms and criteria are tested. The recently proposed SBCMACI (semi-blind CMA with channel identification) (Casella, I.R.S. et al., PIMRC, p.1972-6, 2002) is demonstrated to be ideal because of how few training symbols it needs for convergence. Of the tested criteria, one based on feedback from the decoder (essentially using trellis information) is shown to achieve nearly optimal performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123080087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信