{"title":"Color image segmentation using density-based clustering","authors":"Qixiang Ye, Wen Gao, Wei Zeng","doi":"10.1109/ICASSP.2003.1199480","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199480","url":null,"abstract":"Color image segmentation is an important but still open problem in image processing. We propose a method for this problem by integrating the spatial connectivity and color features of the pixels. Considering that an image can be regarded as a dataset in which each pixel has a spatial location and a color value, color image segmentation can be obtained by clustering these pixels into different groups of coherent spatial connectivity and color. To discover the spatial connectivity of the pixels, density-based clustering is employed, which is an effective clustering method used in data mining for discovering spatial databases. The color similarity of the pixels is measured in Munsell (HVC) color space whose perceptual uniformity ensures the color change in the segmented regions is smooth in terms of human perception. Experimental results using the proposed method demonstrate encouraging performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131984795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D motion vector coding with block base adaptive interpolation filter on H.264","authors":"H. Kimata, Masaki Kitahara, Y. Yashima","doi":"10.1109/ICASSP.2003.1199554","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199554","url":null,"abstract":"Fractional pel motion compensation generally improves coding efficiency due to more precise motion accuracy and low path filtering effect in generating an image at fractional pel positions. In H.264, quarter pel motion compensation is applied, where the image at half pel position is generated by a 6 tap Wiener filter. And the adaptive interpolation filter technique, which adaptively changes filter characteristics for half pel positions has been proposed. That technique also changes the image at quarter pel positions, so it can be exploited to extend motion accuracy to be more precise. In this paper, a 3D motion vector coding (3DMVC) technique with block base adaptive interpolation filter (BAIF) is proposed. This paper also demonstrates the proposed method ensures filter data is successfully integrated into motion vector coding and outperforms the normal H.264.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"23 1 Suppl 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frame-dependent multi-stream reliability indicators for audio-visual speech recognition","authors":"A. Garg, G. Potamianos, C. Neti, Thomas S. Huang","doi":"10.1109/ICASSP.2003.1198707","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198707","url":null,"abstract":"We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116750239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fusion scheme of visual and auditory modalities for event detection in sports video","authors":"Mingliang Xu, Ling-yu Duan, Changsheng Xu, Q. Tian","doi":"10.1109/ICASSP.2003.1199139","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199139","url":null,"abstract":"We propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i.e. silence, hitting ball, applause, commentator speech) with the goal of detecting events with strong semantic meaning. For instance, for tennis video, we have identified five interesting events: serve, reserve, ace, return, and score. Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can be easily adapted to a new sports game. We are extending this fusion scheme to three additional typical sports videos: basketball, volleyball and soccer. Correctly detected sports video events will greatly facilitate further structural and temporal analysis, such as sports video skimming, table of content, etc.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"120 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching","authors":"Jinfu Ni, H. Kawai","doi":"10.1109/ICASSP.2003.1198719","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198719","url":null,"abstract":"A functional fundamental frequency (F/sub 0/) model is applied to extract tone peak and gliding features from Mandarin F/sub 0/ contours aiming at automatic prosodic labeling of a large scale speech corpus. Modeling four lexical tones and representing them in a parametric form based on the F/sub 0/ model, we first cluster baseline tone patterns using the LBG (Linde-Buzo-Gray) algorithm, then perform analysis-by-synthesis-based pattern matching to estimate underlying tone peaks and tone pattern types from observed F/sub 0/ contours and phonetic labels with lexical tones. Tone gliding features are re-estimated after the determination of tone peaks. 94% of the automatically estimated labels were consistent with the manual labels in an open test of 968 utterances from eight native speakers. Also, experimental results indicate that the proposed method is applicable for F/sub 0/ contour smoothing and tone verification.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"433 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120897430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Successive bit-plane rate allocation technique for JPEG2000 image coding","authors":"Y. M. Yeung, O. Au, A. Chang","doi":"10.1109/ICASSP.2003.1199157","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199157","url":null,"abstract":"A novel rate control scheme using successive bit-plane rate allocation (SBRA) is proposed for JPEG2000 image coding. By using the current rate-distortion information only, the proposed method can achieve a quality close to the post-compression rate-distortion (PCRD) optimization scheme adopted in JPEG2000. The proposed scheme can efficiently reduce both the computational cost and working memory size of the entropy coding process up to about 90%, in the case of 0.25bpp (1/32) compression. Without using the future rate-distortion information, the sequential property of the proposed method is highly suitable for real-time (or low delay) applications and implementation.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122271506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yimin Zhang, Q. Diao, Shan Huang, Wei Hu, C. Bartels, J. Bilmes
{"title":"DBN based multi-stream models for speech","authors":"Yimin Zhang, Q. Diao, Shan Huang, Wei Hu, C. Bartels, J. Bilmes","doi":"10.1109/ICASSP.2003.1198911","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198911","url":null,"abstract":"We propose dynamic Bayesian network (DBN) based synchronous and asynchronous multi-stream models for noise-robust automatic speech recognition. In these models, multiple noise-robust features are combined into a single DBN to obtain better performance than any single feature system alone. Results on the Aurora 2.0 noisy speech task show significant improvements of our synchronous model over both single stream models and over a ROVER based fusion method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A small sample model selection criterion based on Kullback's symmetric divergence","authors":"A. Seghouane, M. Bekara, G. Fleury","doi":"10.1109/ICASSP.2003.1201639","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201639","url":null,"abstract":"The Kullback information criterion (KIC) is a recently developed tool for statistical model selection (Cavanaugh, J.E., Statistics and Probability Letters, vol.42, p.333-43, 1999). KIC serves as an asymptotically unbiased estimator of a variant of the Kullback symmetric divergence, known also as J-divergence. A bias correction of the Kullback symmetric information criterion is derived for linear models. The correction is of particular use when the sample size is small or when the number of fitted parameters is of a moderate to large fraction of the sample size. For linear regression models, the corrected method, called KICc, is an exactly unbiased estimator of a variant of the Kullback symmetric divergence between the true unknown model and the candidate fitted model. Furthermore, KICc is found to provide better model order choice than any other asymptotically efficient methods when applied to autoregressive time series models.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-channel speaker identification using usable speech extraction based on multi-pitch tracking","authors":"Yang Shao, Deliang Wang","doi":"10.1109/ICASSP.2003.1202330","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202330","url":null,"abstract":"Recently, usable speech criteria have been proposed to extract minimally corrupted speech for speaker identification (SID) in co-channel speech. In this paper, we propose a new usable speech extraction method to improve the SID performance under the co-channel situation based on the pitch information obtained from a robust multi-pitch tracking algorithm [2]. The idea is to retain the speech segments that have only one pitch detected and remove the others. The system is evaluated on co-channel speech and results show a significant improvement across various target to interferer ratios (TIR) for speaker identification.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind (training-like) decoder assisted beamforming for DS-CDMA systems","authors":"R. Pacheco, D. Hatzinakos","doi":"10.1109/ICASSP.2003.1202672","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202672","url":null,"abstract":"We propose an iterative blind beamforming strategy for short-burst high-rate DS-CDMA systems. The blind strategy works by creating a set of \"training sequences\" in the receiver that is used as input to a semi-blind beamforming algorithm, thus producing a corresponding set of beamformers. The objective then becomes to find which beamformer gives the best performance (smallest bit error). Two challenges we face are: (1) to find a semi-blind algorithm that requires very few training symbols (to minimize the search time); (2) to find an appropriate criterion for picking the beamformer that offers the best performance. Different semi-blind algorithms and criteria are tested. The recently proposed SBCMACI (semi-blind CMA with channel identification) (Casella, I.R.S. et al., PIMRC, p.1972-6, 2002) is demonstrated to be ideal because of how few training symbols it needs for convergence. Of the tested criteria, one based on feedback from the decoder (essentially using trellis information) is shown to achieve nearly optimal performance.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123080087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}