{"title":"A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition","authors":"K. Aikawa, H. Singer, Hideki Kawahara, Y. Tohkura","doi":"10.1109/ICASSP.1993.319399","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319399","url":null,"abstract":"A dynamic cepstrum parameter that incorporates the time-frequency characteristics of auditory forward masking is proposed. A masking model is derived from psychological experimental results. A novel operational method using a lifter array is derived to perform the time-frequency masking. The parameter simulates the effective input spectrum at the front-end of the auditory system and can enhance the spectral dynamics. The parameter represents both the instantaneous and transitional aspects of a spectral time series. Phoneme and continuous speech recognition experiments demonstrated that the dynamic cepstrum outperforms the conventional cepstrum individually and in various combinations with other spectral parameters. The phoneme recognition results were improved for ten male and ten female speakers. The masking lifter with a Gaussian window provided a better performance than that with a square window.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"19 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126097765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Kakarala, B. M. Bennett, G. Iverson, M. D'Zmura
{"title":"Bispectral techniques for spherical functions","authors":"R. Kakarala, B. M. Bennett, G. Iverson, M. D'Zmura","doi":"10.1109/ICASSP.1993.319633","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319633","url":null,"abstract":"The authors address two problems involving spherical functions: determining when two spherical functions are 3-D rotated copies of each other; and averaging several noisy observations of a rotating spherical function. The solution to both problems uses the spherical bispectrum, which is the generalization of the well-known Euclidean bispectrum. The spherical bispectrum is formulated and it is shown that it is invariant under 3-D rotation of the underlying Gaussian noise. An algorithm for recovering spherical functions from their bispectra is demonstrated.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124657717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image filtering by gradient inverse inhomogeneous diffusion","authors":"A. El-Fallah, G. Ford","doi":"10.1109/ICASSP.1993.319750","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319750","url":null,"abstract":"A method is developed for the synthesis of a nonlinear adaptive filter based on solutions to the inhomogeneous diffusion equation. The approach is based on the specification of the first derivative of the signal in time (scale). A general solution is derived and is then specialized to the scale invariance case, in which the diffusion coefficient is shown to be the gradient inverse. A novel discrete realization of the inhomogeneous diffusion equation is developed for the noise removal problem, and experimental results are shown. The proposed algorithm not only removes noise but simultaneously enhances and localizes edges. It is extremely simple and parallel, and does not require the detection of any of the many possible line and edge configurations. Since the algorithm is sensitive to the local context, it satisfies human vision requirements more than conventional methods which rely on minimizing the mean square error.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"333 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124687112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polyspectra-based, blind, MMSE, fractionally-spaced equalization of a cyclostationary signal","authors":"M. Webster","doi":"10.1109/ICASSP.1993.319648","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319648","url":null,"abstract":"Two weaknesses of polyspectra-based blind equalization are addressed. The first weakness involves a zero-forcing (ZF) characteristic, where intersymbol interference is eliminated with disregard for associated mean squared error (MSE). It is shown that a minimum-MSE (MMSE) constraint can be added for signals operating in colored Gaussian noise, with only a small increase in computational complexity and with performance enhancement under certain scenarios. The second weakness involves the constraint that the signal be stationary, which fails to exploit the cyclostationary features of most communication signals (W. A. Gardner). In particular, it is shown that polyspectra-based techniques can be used with a fractionally-spaced equalizer, with attendant performance boosts in timing insensitivity (S. H. Qureshi, 1985) and damaged-spectrum restoration (W. A. Gardner, 1991). The technique decomposes the signal into stationary streams, computing the individual and joint statistics of the partitions, and then solves the optimal Wiener equation.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124817650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward efficient morphological shape representation","authors":"J. Reinhardt, W. Higgins","doi":"10.1109/ICASSP.1993.319763","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319763","url":null,"abstract":"A shape representation scheme that is typically more computationally efficient than the morphological skeleton and MSD (morphological shape decomposition) is proposed. This method greatly augments the MSD. The authors introduce new constituent component types and incorporate a cost-based search strategy for finding an efficient representation. If representation error is permissible, even more efficient representations are possible. However, search time is an issue for the method.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129364134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cancellation of ISI in non-linear voice-band data channels","authors":"Z. Fejzo, H. Lev-Ari","doi":"10.1109/ICASSP.1993.319516","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319516","url":null,"abstract":"A technique for cancellation of intersymbol interference (ISI) in nonlinear voice band data channels that contain second and third order nonlinear elements is presented. A data-aided nonlinear canceller structure based on a configuration proposed by E. Biglieri et al. (IEEE J. Sel. Areas Commun. vol.SAC-2, no.5, p.765-77, Sept. 1984) is used to remove nonlinear ISI without excessive noise enhancement. While the scheme proposed by Biglieri et al. uses a least mean square (LMS) adaptation algorithm, the authors choose to use a recursive least squares (RLS) adaptation algorithm in order to provide a faster convergence of the algorithm and improved robustness when applied to a variety of channels. A fast version of the RLS algorithm with computational complexity of order O(9N) (where N is the order of the nonlinear filter) is derived. The performance of the nonlinear canceller with both fast RLS and LMS adaptation is evaluated by simulation for 16-QAM (quadrature amplitude modulation) data transmission over nonlinear channels with a rate of 9600 bit/s. The simulation results show a performance improvement when the fast RLS version is used as compared with the LMS.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129467043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The estimation of powerful language models from small and large corpora","authors":"P. Placeway, R. Schwartz, Pascale Fung, L. Nguyen","doi":"10.1109/ICASSP.1993.319222","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319222","url":null,"abstract":"The authors consider the estimation of powerful statistical language models using a technique that scales from very small to very large amounts of domain-dependent data. They begin with improved modeling of the grammar statistics, based on a combination of the backing-off technique and zero-frequency techniques. These are extended to be more amenable to the particular system considered here. The resulting technique is greatly simplified, more robust, and gives improved recognition performance over either of the previous techniques. The authors also consider the problem of robustness of a model based on a small training corpus by grouping words into obvious semantic classes. This significantly improves the robustness of the resulting statistical grammar. A technique that allows the estimation of a high-order model on modest computation resources is also presented. This makes it possible to run a 4-gram statistical model of a 50-million word corpus on a workstation of only modest capability and cost. Finally, the authors discuss results from applying a 2-gram statistical language model integrated in the HMM (hidden Markov model) search, obtaining a list of the N-best recognition results, and rescoring this list with a higher-order statistical model.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128282818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic bit allocation in CELP excitation coding","authors":"T. Eriksson, Johan Sjöberg","doi":"10.1109/ICASSP.1993.319261","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319261","url":null,"abstract":"The excitation coding, i.e., the LTP (long term predictor) and the innovation coding, requires a large part of the overall bit rate in a CELP (code-excited linear prediction) coder. A method to reduce the excitation coding bit rate is proposed. The fact that the pitch varies only slowly during voiced segments of speech can be exploited to design powerful dynamic bit allocation schemes for the excitation sequence. The bit allocation is determined by two methods. In one method, the LTP index is Huffman-coded. This makes the LTP code book require only a small number of bits during speech segments with stable pitch frequency, i.e., voiced segments. In the other method, a high rate approximation for assigning various numbers of innovation code words for each LTP index is derived. As a complement to dynamic bit allocation, a search method for the LTP index is developed that takes into account the number of innovation code words assigned to each LTP index, in the search for an optimal LTP sequence. Simulations are included that show that with these methods the bit rate can be reduced by 400 bit/s with no changes in speech quality.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128334096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subspace tracking based on the projection approach and the recursive least squares method","authors":"Bin Yang","doi":"10.1109/ICASSP.1993.319615","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319615","url":null,"abstract":"The author presents a new algorithm for tracking the signal subspace recursively. It is based on a novel interpretation of the signal subspace as the solution of a projection like unconstrained minimization task. It is shown that the recursive least squares technique can be applied to solve this problem by approximation projections appropriately. The resulting algorithm has a computational complexity of O(nr) where n is the dimension of the problem and r is the number of desired eigencomponents, respectively. Simulation results show that the frequency tracking capability of this algorithm is virtually identical to and in some cases more robust than the more computationally expensive batch eigendecomposition.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128485392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-recursive architectures and wavelet transform","authors":"Emmanuel N. Frantzeskakis, J. Baras, K. J. Liu","doi":"10.1109/ICASSP.1993.319151","DOIUrl":"https://doi.org/10.1109/ICASSP.1993.319151","url":null,"abstract":"The time-recursive computation has been proved as a particularly useful tool in real-time data compression and in transform domain adaptive filtering, with applications in the areas of audio, radio, sonar, and video. An architectural framework for parallel time-recursive computation is proposed. The authors consider a class of linear operators that consists of the discrete time, time invariant, compactly supported, but otherwise arbitrary kernel functions. They define a shift property of the linear operators and reveal its relation with the time-recursive implementation. The potential of the proposed framework is demonstrated by designing a time-recursive architecture for the discrete wavelet transform.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129325914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}