{"title":"A speech understanding module for a multimodal mathematical formula editor","authors":"J. Hunsinger, M. Lang","doi":"10.1109/ICASSP.2000.859328","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859328","url":null,"abstract":"As part of a framework for a multimodal mathematical formula editor which will support natural speech and handwriting interaction, a single stage speech understanding module is presented. It is based on a multilevel statistical, expectation driven approach. Completely spoken realistic formulas containing basic arithmetic operations, roots, indexed sums, integrals, trigonometric functions, logarithms, convolutions, fourier transforms, exponentiations, and indexing (among others) were examined. The speaker specific or formula specific structural recognition accuracies reach up to 90% or 100%, respectively. For visualization and postprocessing purposes, a transformation into Adobe(R) FrameMaker(R) documents is performed. An advanced variant of this architecture will further be utilized as the basis for a multimodal semantic decoder incorporating combined script and speech analysis. It will enclose a so-called multimodal probabilistic grammar which will be trained via multimodal usability tests.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128021004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures","authors":"A. Jourjine, S. Rickard, Ö. Yilmaz","doi":"10.1109/ICASSP.2000.861162","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861162","url":null,"abstract":"We present a novel method for blind separation of any number of sources using only two mixtures. The method applies when sources are (W-)disjoint orthogonal, that is, when the supports of the (windowed) Fourier transform of any two signals in the mixture are disjoint sets. We show that, for anechoic mixtures of attenuated and delayed sources, the method allows one to estimate the mixing parameters by clustering ratios of the time-frequency representations of the mixtures. The estimates of the mixing parameters are then used to partition the time-frequency representation of one mixture to recover the original sources. The technique is valid even in the case when the number of sources is larger than the number of mixtures. The general results are verified on both speech and wireless signals.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128024653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. A. Trenas, Juan López, E. Zapata, Francisco Argüello
{"title":"An architecture for wavelet-packet based speech enhancement for hearing aids","authors":"M. A. Trenas, Juan López, E. Zapata, Francisco Argüello","doi":"10.1109/ICASSP.2000.859093","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859093","url":null,"abstract":"Wavelet packets have been applied in order to compensate the speech signal to improve the intelligibility for a common hearing impairment known as recruitment of loudness, a sensorineural hearing loss of cochlear origin. We present an architecture that allows selection of the best decomposition tree for each patient, in order to apply this wavelet-packet based parametric compression algorithm.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125599863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An iterative method for designing orthogonal two-channel FIR filter banks with regularities","authors":"R. Bregović, T. Saramäki","doi":"10.1109/ICASSP.2000.862021","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.862021","url":null,"abstract":"An efficient iterative method is described for designing orthogonal two-channel perfect-reconstruction FIR filter banks in such a way that the low-pass analysis filter has the given number of fixed zeros at z=-1 and its energy in the given stopband region is minimized. When using the resulting two-channel filter bank for generating discrete-time wavelet banks, the number of vanishing moments is equal to the number of zeros being located at z=-1. The proposed design scheme is fast and the convergence to the optimum solution is independent of the starting-point filter bank. Compared to the two-channel filter bank equivalents designed in the minimax sense as proposed by Rioul and Duhamel (1994), the regularities of the resulting wavelets are increased and the stopband energies of the subfilters are decreased. If there are no constraints on the number of zeros at z=-1, then the resulting banks are useful building blocks in generating frequency-selective multi-channel filter banks and octave filter banks.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121860499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear regression under maximum a posteriori criterion with Markov random field prior","authors":"Xintian Wu, Yonghong Yan","doi":"10.1109/ICASSP.2000.859130","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859130","url":null,"abstract":"Speaker adaptation using linear transformations under the maximum a posteriori (MAP) criterion has been studied in this paper. The purpose is to improve the matrix estimation in the widely used maximum likelihood linear regression (MLLR) adaptation, which might generate poorly structured transform matrices when adaptation data are sparse. Unlike traditional MAP based adaptations, many known prior distributions of HMM parameters, such as normal-Washart priors, do not have a close form solution in the transform estimation. In Markov random field linear regression (MRFLR), the prior distribution of HMM parameters is modeled by Markov random field, which leads to a close form solution of estimating the linear transforms. Experimental results show that MRFLR outperforms MLLR when adaptation data are sparse, and converges to the MLLR performances when more adaptation data are available.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121940316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Christogiannis, T. Varvarigou, Agatha Zappa, Yiannis Vamvakoulas, Chilin Shih, A. Arvaniti
{"title":"Construction of the acoustic inventory for a Greek text-to-speech concatenative synthesis system","authors":"C. Christogiannis, T. Varvarigou, Agatha Zappa, Yiannis Vamvakoulas, Chilin Shih, A. Arvaniti","doi":"10.1109/ICASSP.2000.859113","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859113","url":null,"abstract":"The development of the Greek text-to-speech (TTS) system by NTUA is based on the method of concatenative synthesis and follows the Bell Labs approach to this technique. Concatenative synthesis is one of the simplest methods for speech synthesis and at the same time bypasses most of the problems encountered by articulatory and formant synthesis techniques. The method relies on designing and creating the acoustic inventory of the language by taking real recorded speech, cutting it into segments and concatenating these segments back together during synthesis. The design and implementation of the acoustic database is a key factor for the performance of the synthesizer, since all the possible phone-to-phone transitions must be considered in order to minimize abrupt discontinuities and thus maximize the naturalness of the synthesized utterances.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121974476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2D mesh-based detection and representation of an occluding object for object-based video","authors":"Mete H. Gökçetekin, Isil Celasun, A. Tekalp","doi":"10.1109/ICASSP.2000.859203","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859203","url":null,"abstract":"In this study, an algorithm for mesh-based detection of occlusion caused by a newly entering object into the scene, which covers the information present in the current frame, and mesh-based representation of it is proposed. A 2D Delaunay triangulated dynamic mesh is initially designed on the first frame of the sequence. The motion of each node is then compared to its average motion. Frames with nodes of high activity, with different directions and forming a region are selected to be analyzed for detection of newly entering object(s) into the scene. A region formed by detection of bad motion vectors is enlarged using a distance criterion. The detected frame and the preceding one are range filtered, The luminance components of these two frames are formed. The difference of the range filtered frames and of their respective luminance components are taken into account. The differences are checked with respect to a threshold value inside the formed region. Pixels exceeding this threshold form the newly entering object. Since there may be separate regions formed by these pixels, mesh-based merging of these regions is then accomplished. The detected newly entering object is then meshed and tracked as a new object in the scene in accordance with the occluded object. The proposed 2D mesh-based occlusion detection and representation method can be applied in object-based video coding, storage and manipulation.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127991034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bias-free adaptive IIR filtering","authors":"Woo‐Jin Song, Hyun-Chool Shin","doi":"10.1109/ICASSP.2000.861976","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861976","url":null,"abstract":"We present a new family of algorithms that solve the bias problem in the equation-error based adaptive infinite impulse response (IIR) filtering. A novel constraint, called the constant-norm constraint, unifies the quadratic constraint and the monic one. By imposing the monic constraint on the mean square error (MSE) optimization, the merits of both constraints are inherited and the shortcomings are overcome. A new cost function based on the constant-norm constraint and Lagrange multiplier is defined. Minimizing the cost function gives birth to a new family of bias-free adaptive IIR filtering algorithms. For example, three efficient algorithms belonging to the family are proposed. The analysis of the stationary points is presented to show that the proposed methods can indeed produce bias-free parameter estimates in the presence of noise. The simulation results demonstrate that the proposed methods perform better than existing algorithms, while being very simple both in computation and implementation.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121774473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rubén San-Segundo-Hernández, B. Pellom, Wayne H. Ward, J. Pardo
{"title":"Confidence measures for dialogue management in the CU Communicator system","authors":"Rubén San-Segundo-Hernández, B. Pellom, Wayne H. Ward, J. Pardo","doi":"10.1109/ICASSP.2000.859190","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859190","url":null,"abstract":"This paper provides improved confidence assessment for detection of word-level speech recognition errors and out-of-domain user requests using language model features. We consider a combined measure of confidence that utilizes the language model back-off sequence, language model score, and phonetic length of recognized words as indicators of speech recognition confidence. The paper investigates the ability of each feature to detect speech recognition errors and out-of-domain utterances as well as two methods for combining the features contextually: a multi-layer perceptron and a statistical decision tree. We illustrate the effectiveness of the algorithm by considering utterances from the ATIS airline information task as either in-domain and out-of-domain for the DARPA Communicator task. Using this hand-labeled data, it is shown that 27.9% of incorrectly recognized words and 36.4% of out-of-domain phrases are detected at a 2.5% false alarm rate.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121778866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Gaussian mixtures in an LVCSR system","authors":"G. Zweig, M. Padmanabhan","doi":"10.1109/ICASSP.2000.861945","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861945","url":null,"abstract":"In this paper, we apply boosting to the problem of frame-level phone classification, and use the resulting system to perform voicemail transcription. We develop parallel, hierarchical, and restricted versions of the classic AdaBoost algorithm, which enable the technique to be used in large-scale speech recognition tasks with hundreds of thousands of Gaussians and tens of millions of training frames. We report small but consistent improvements in both frame recognition accuracy and word error rate.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115857746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}