{"title":"Isotropic noise modelling for nearfield array processing","authors":"T. Abhayapala, R. Kennedy, R. Williamson","doi":"10.1109/ASPAA.1999.810837","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810837","url":null,"abstract":"An exact series representation for a nearfield spherically isotropic noise model is introduced. The methodology uses the spherical harmonics expansion of the wavefield at a sensor to obtain the correlation between two sensors due to the nearfield isotropic noise field. The result is useful in nearfield application of sensor arrays. The proposed noise model can be utilized effectively to apply well established farfield array processing algorithms for nearfield applications. Specifically, any signal processing criterion based on farfield isotropic noise correlation can be reformulated with nearfield noise with this representation. A simple array gain optimization is used to demonstrate the new noise model.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New phase-vocoder techniques for pitch-shifting, harmonizing and other exotic effects","authors":"Jean Laroche, M. Dolson","doi":"10.1109/ASPAA.1999.810857","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810857","url":null,"abstract":"The phase-vocoder is usually presented as a high-quality solution for time-scale modification of signals, pitch-scale modifications usually being implemented as a combination of timescaling and sampling rate conversion. We present two new phase-vocoder-based techniques which allow direct manipulation of the signal in the frequency-domain, enabling such applications as pitch-shifting, chorusing, harmonizing, partial stretching and other exotic modifications which cannot be achieved by the standard time-scale sampling-rate conversion scheme. The new techniques are based on a very simple peak-detection stage, followed by a peak-shifting stage. The very simplest one allows for 50% overlap but restricts the precision of the modifications, while the most flexible techniques requires a more expensive 75% overlap.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Studies of a wideband stereophonic acoustic echo canceler","authors":"P. Eneroth, T. Gansler, S. Gay, J. Benesty","doi":"10.1109/ASPAA.1999.810886","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810886","url":null,"abstract":"In this paper a wideband stereophonic acoustic echo canceler is presented. The fundamental difficulty of stereophonic acoustic echo cancellation (SAEC) is described and an echo canceler based on a fast recursive least squares algorithm in a subband structure is proposed. This structure have been used in a real-time implementation, on which experiments have been performed. In the paper, simulation results of this implementation on real life recordings, with 8 kHz bandwidth, are studied. The results clearly verify that the theoretic fundamental problem of SAEC also applies in real-life situations. They also show that more sophisticated adaptive algorithms are needed in the lower frequency regions than in the higher regions.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130094008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of the phase vocoder to pitch-preserving synchronization of an audio stream to an external clock","authors":"R. Sussman, J. Laroche","doi":"10.1109/ASPAA.1999.810853","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810853","url":null,"abstract":"The phase vocoder is usually presented as a high-quality solution for time-scale modification of signals, Its main advantages versus the cheaper time-domain techniques include the high-quality of the output for a wide range of types of input signals (speech, music, noise), and the possibility to perform very large factor modifications (e.g., four-fold time-stretching or more). In this paper, we present two applications that require such extreme modification factors: we call the first one pitch-preserving audio scrubbing, in which a user can move a pointer along an audio track and hear the sound at the corresponding location without any pitch alteration. Because the user controls the playback location (and therefore the playback speed), and can very well stop at a given location, the required time-scale modification can involve a very large-factor. The second application consists of synchronizing an audio stream to a video stream, while avoiding pitch alteration. For extreme slow-motion playback, the time-scaling operation required to preserve the pitch can also involve a very large factor. We address theoretical and practical issues related to pitch-preserving synchronization of an audio track. Techniques are discussed to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On some derivations of Gibson's approach for speech enhancement","authors":"É. Grivel, M. Gabrea, M. Najim","doi":"10.1109/ASPAA.1999.810868","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810868","url":null,"abstract":"This paper deals with a Kalman filter-based enhancement of a speech signal embedded in a colored noise, when using a single microphone system. Several approaches using Kalman filtering have been developed. More particularly, Gibson et al. (1991) reported an iterative method based on the so called \"noise-free\" state space model, which may imply the introduction of a coordinate transformation to perform Kalman filtering. The authors do not address the identification issue. We propose some derivations of this method through an identification step using subspace methods for identification, previously developed in the field of control by Van Overschee (1993). The methods proposed here are then compared with other Kalman based-approaches.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132358238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robustness analysis of 3D audio using loudspeakers","authors":"D. Ward, G. Elko","doi":"10.1109/ASPAA.1999.810882","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810882","url":null,"abstract":"It is well known that the effectiveness of 3D audio systems is critically dependent on the listener's head being in a known location. In this paper we analyze the fundamental role played by the loudspeaker positions in determining the robustness of the crosstalk canceler. Based on an extremely simple head model, we derive straightforward expressions for the loudspeaker positions that optimize the system robustness, which is measured by matrix condition numbers. These derived optimum positions are then compared with empirically-derived optimum positions obtained from actual HRTF (head related transfer function) measurements. The results indicate that our analytical expressions accurately predict the optimum loudspeaker positions.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multifeature audio segmentation for browsing and annotation","authors":"G. Tzanetakis, P. Cook","doi":"10.1109/ASPAA.1999.810860","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810860","url":null,"abstract":"Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116532026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A systematic hybrid analog/digital audio coder","authors":"R. Barron, A. Oppenheim","doi":"10.1109/ASPAA.1999.810843","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810843","url":null,"abstract":"This paper describes a signal coding solution for a hybrid channel that is the composition of two channels: a noisy analog channel through which a signal source is sent unprocessed and a secondary rate-constrained digital channel. The source is processed prior to transmission through the digital channel. Signal coding solutions for this hybrid channel are clearly applicable to the in-band on-channel (IBOC) digital audio broadcast (DAB) problem. We present the design of a perceptually-based subband audio coder, with complexity comparable to conventional coders, that exploits a signal at the receiver of the form y[n]=g[n]*x[n]+u[n], where x[n], g[n], and u[n] denote respectively the source, the impulse response of convolutional distortion, and additive Gaussian noise. Concepts from conventional subband coding, e.g. subband decomposition, quantization, bit allocation, and lossless signal coding, are tailored to exploit the analog signal at the receiver such that frequency-weighted mean-squared error is minimized.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121938635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum","authors":"Yoon Kim, J.O. Smith","doi":"10.1109/ASPAA.1999.810867","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810867","url":null,"abstract":"We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"57 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements to the switched parametric and transform audio coder","authors":"S. Levine, J.O. Smith","doi":"10.1109/ASPAA.1999.810845","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810845","url":null,"abstract":"We introduce improvements to previous sines+transients+noise audio modeling systems, including new sinusoidal trajectory selection and quantization procedures. In a previous work by Levine and Smith (see Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Phoenix, 1999), the audio is first segmented into transient and non-transient regions. The transient region is modeled using traditional transform coding techniques, while the non-transient regions are modeled using parametric sines plus noise modeling. Because such a system contains a mix of parametric and non-parametric techniques, compressed-domain processing such as time-scale modifications are possible.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129187925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}