{"title":"A method of generating uniformly distributed sequences over [0,K], where K+1 is not a power of two","authors":"R. Kuehnel, Yuke Wang","doi":"10.1109/ICASSP.2003.1202488","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202488","url":null,"abstract":"A new methodology has been recently proposed for the efficient generation of multiple pseudo-random bit sequences that are statistically uncorrelated [1]. Random sequences that are uniformly distributed over a range [0,K], where K+1 is a power of 2, can be constructed by forming a vector of M independent bit sequences, where M=log/sub 2/ (K+1). We demonstrate that this method of construction represents a special case of a more generalized approach in which K can be any positive integer. The procedures described here can be used to efficiently generate multiple independent random sequences that are uniformly distributed over any range.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123669697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-domain method for tracking dispersive channels in MIMO OFDM systems","authors":"T. Roman, M. Enescu, V. Koivunen","doi":"10.1109/ICASSP.2003.1202662","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202662","url":null,"abstract":"In this paper we address the problem of channel estimation for multiple-input multiple-output OFDM systems for mobile users. A channel tracking and equalization method stemming from Kalman filtering is proposed for time-frequency selective channels. Tracking of the MIMO channel matrix is performed in the time-domain and equalization in the frequency domain. The computational complexity is significantly reduced by applying the matrix inversion lemma. Simulation results are presented using a realistic channel model in typical urban scenarios.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125756902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Schemes for error resilient streaming of perceptually coded audio","authors":"J. Korhonen, Ye-Kui Wang","doi":"10.1109/ICASSP.2003.1200077","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200077","url":null,"abstract":"This paper presents novel extensions to our earlier system for streaming perceptually coded audio over error prone channels such as Mobile IP. To improve error robustness while maintaining bandwidth efficiency, the new extensions combine the strength of an error resilient coding scheme in the sender, a prioritized packet transport scheme in the network and a compressed domain error concealment strategy in the terminal. Different concealment methods are used for each part of the coded audio data according to their perceptual importance and statistical characteristics. In our current implementation, we employed MPEG-2 Advanced Audio Coding (AAC) encoded bitstreams and an RTP/UDP-based test system for performance evaluation. Simulation results have shown that our improved streaming system is more robust against packet losses in comparison with conventional methods.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127967914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of subspace analysis for face recognition","authors":"Jian Li, S. Zhou, C. Shekhar","doi":"10.1109/ICASSP.2003.1199122","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199122","url":null,"abstract":"We report the results of a comparative study on subspace analysis methods for face recognition. In particular, we have studied four different subspace representations and their 'kernelized' versions if available. They include both unsupervised methods such as principal component analysis (PCA) and independent component analysis (ICA), and supervised methods such as Fisher discriminant analysis (FDA) and probabilistic PCA (PPCA) used in a discriminative manner. The 'kernelized' versions of these methods provide subspaces of high-dimensional feature spaces induced by non-linear mappings. To test the effectiveness of these subspace representations, we experiment on two databases with three typical variations of face images, i.e, pose, illumination and facial expression changes. The comparison of these methods applied to different variations in face images offers a comprehensive view of all the subspace methods currently used in face recognition.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116087203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HMM-neural network monophone models for computer-based articulation training for the hearing impaired","authors":"M. Devarajan, Fansheng Meng, P. Hix, S. Zahorian","doi":"10.1109/ICASSP.2003.1202373","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202373","url":null,"abstract":"A visual speech training aid for persons with hearing impairments has been developed using a Windows-based multimedia computer. Previous papers (Zahorian, S. et al., Int. Conf. on Spoken Language Processing, 2002; Zahorian and Nossair, Z.B., IEEE Trans. on Speech and Audio Processing, vol.7, no.4, p.414-25, 1999; Zimmer, A. et al., ICASSP, vol.6, p.3625-8, 1998; Zahorian and Jagharghi, A., J. Acoust. Soc. Amer., vol.94, no.4, p.1966-82, 1993) have describe the signal processing steps and display options for giving real-time feedback about the quality of pronunciation for 10 steady-state American English monopthong vowels (/aa/, /iy/, /uw/, /ae/, /er/, /ih/, /eh/, /ao/, /ah/, and /uh/). This vowel training aid is thus referred to as a vowel articulation training aid (VATA). We now describe methods to develop a monophone-based hidden Markov model/neural network recognizer such that real time visual feedback can be given about the quality of pronunciation of short words and phrases. Experimental results are reported which indicate a high degree of accuracy for labeling and segmenting the CVC (consonant-vowel-consonant) database developed for \"training\" the display.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Watermarking of 3D models using principal component analysis","authors":"Andreas Kalivas, A. Tefas, I. Pitas","doi":"10.1109/ICASSP.2003.1200061","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200061","url":null,"abstract":"A novel method for 3D model watermarking, robust to geometric distortions such as rotation, translation and scaling, is proposed. A ternary watermark is embedded in the vertex topology of a 3D model. A transformation of the model to an invariant space is proposed prior to watermark embedding. Simulation results indicate the ability of the proposed method to deal with the aforementioned attacks giving very good results.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131899850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using phone and diphone based acoustic models for voice conversion: a step towards creating voice fonts","authors":"Arun Kumar, Ashish Verma","doi":"10.1109/ICASSP.2003.1198882","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198882","url":null,"abstract":"Voice conversion techniques attempt to modify the speech signal so that it is perceived as if spoken by another speaker, different from the original speaker. In this paper, we present a novel approach to perform voice conversion. Our approach uses acoustic models based on units of speech, like phones and diphones, for voice conversion. These models can be computed and used independently for a given speaker without being concerned about the source or target speaker. It avoids the use of a parallel speech corpus in the voices of source and target speakers. It is shown that by using the proposed approach, voice fonts can be created and stored which represent individual characteristics of a particular speaker, to be used for customization of synthetic speech. We also show through objective and subjective tests, that voice conversion quality is comparable to other approaches that require a parallel speech corpus.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124944972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A probabilistic approach for blind source separation of underdetermined convolutive mixtures","authors":"J. M. Peterson, S. Kadambe","doi":"10.1109/ICASSP.2003.1201748","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201748","url":null,"abstract":"There are very few techniques that can separate signals from the convolutive mixture in the underdetermined case. We have developed a method that uses overcomplete expansion of the signal created with a time-frequency transform and that also uses the property of sparseness and a Laplacian source density model to obtain the source signals from the instantaneously mixed signals in the underdetermined case. This technique has been extended here to separate signals (a) in the case of underdetermined convolutive mixtures, and (b) in the general case of more than 2 mixtures. Here, we also propose a geometric constrained based search approach to significantly reduce the computational time of our original \"dual update\" algorithm. Several examples are provided. The results of signal separation from the convolutive mixtures indicate that an average signal to noise ratio improvement of 5.3 dB can be obtained.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122169947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Oscillatory gestures and discourse","authors":"Francis K. H. Quek, Yingen Xiong","doi":"10.1109/ICASSP.2003.1200090","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200090","url":null,"abstract":"Gesture and speech are part of a single human language system. They are co-expressive and complementary channels in the act of speaking. While speech carries the major load of symbolic presentation, gesture provides the imagistic content. Proceeding from the established contemporality of gesture and speech, we discuss our work on oscillatory gestures and speech. We present our wavelet-based approach in gestural oscillation extraction as geodesic ridges in frequency-time space. We motivate the potential of such computational cross-modal language analysis by performing a micro analysis of a video dataset in which a subject describes her living space. We demonstrate the ability of our algorithm to extract gestural oscillations and show how oscillatory gestures reveal portions of the discourse structure.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123924764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unconstrained motion compensated temporal filtering (UMCTF) framework for wavelet video coding","authors":"M. Schaar, D. Turaga","doi":"10.1109/ICASSP.2003.1199112","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199112","url":null,"abstract":"The paper presents a new framework for adaptive temporal filtering in wavelet interframe codecs, called unconstrained motion compensated temporal filtering (UMCTF). This framework allows flexible and efficient temporal filtering by combining the best features of motion compensation, used in predictive coding, with the advantages of interframe scalable wavelet video coding schemes. UMCTF provides higher coding efficiency, improved visual quality and flexibility of temporal and spatial scalability, higher coding efficiency and lower decoding delay than conventional MCTF schemes. Furthermore, UMCTF can also be employed in alternative open-loop scalable coding frameworks using DCT for the texture coding.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}