{"title":"Connectionist speaker normalization and its applications to speech recognition","authors":"X.D. Huang, K. Lee, A. Waibel","doi":"10.1109/NNSP.1991.239506","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239506","url":null,"abstract":"Speaker normalization may have a significant impact on both speaker-adaptive and speaker-independent speech recognition. In this paper, a codeword-dependent neural network (CDNN) is presented for speaker normalization. The network is used as a nonlinear mapping function to transform speech data between two speakers. The mapping function is characterized by two important properties. First, the assembly of mapping functions enhances overall mapping quality. Second, multiple input vectors are used simultaneously in the transformation. This not only makes full use of dynamic information but also alleviates possible errors in the supervision data. Large-vocabulary continuous speech recognition is chosen to study the effect of speaker normalization. Using speaker-dependent semi-continuous hidden Markov models, performance evaluation over 360 testing sentences from new speakers showed that speaker normalization significantly reduced the error rate from 41.9% to 5.0% when only 40 speaker-dependent sentences were used to estimate CDNN parameters.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129238630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concept formation and statistical learning in nonhomogeneous neural nets","authors":"R. Tutwiler, L. Sibul","doi":"10.1109/NNSP.1991.239538","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239538","url":null,"abstract":"The authors present an analysis of complex nonhomogeneous neural nets, an adaptive statistical learning algorithm, and the potential use of these types of systems to perform a general sensor fusion problem. The three main points are the following. First, an extension to the theory of statistical neurodynamics is introduced to include the analysis of complex nonhomogeneous neuron pools consisting of three subnets. Second, a statistical learning algorithm is developed based on the differential geometric theory of statistical inference for the adaptive updating of the synaptic interconnection weights. The statistical learning algorithm is merged with the subnets of nonhomogeneous nets and it is shown how these ensembles of nets can be applied to solve a general sensor fusion problem.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130059377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An alternative proof of convergence for Kung-Diamantaras APEX algorithm","authors":"H. Chen, R. Liu","doi":"10.1109/NNSP.1991.239537","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239537","url":null,"abstract":"The problem of adaptive principal components extraction (APEX) has gained much interest. In 1990, a new neuro-computation algorithm for this purpose was proposed by S. Y. Kung and K. I. Diamautaras. (see ICASSP 90, p.861-4, vol.2, 1990). An alternative proof is presented to illustrate that the K-D algorithm is in fact richer than has been proved before. The proof shows that the neural network will converge and the principal components can be extracted, without assuming that some of projections of synaptic weight vectors have diminished to zero. In addition, the authors show that the K-D algorithm converges exponentially.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fuzzy tracking of multiple objects","authors":"L. Perlovsky","doi":"10.1117/12.138232","DOIUrl":"https://doi.org/10.1117/12.138232","url":null,"abstract":"The authors have applied a previously developed MLANS neural network to the problem of tracking multiple objects in heavy clutter. In their approach the MLANS performs a fuzzy classification of all objects in multiple frames in multiple classes of tracks and random clutter. This novel approach to tracking using an optimal classification algorithm results in a dramatic improvement of performance: the MILANS tracking combines advantages of both the JPD and the MHT, it is capable of track initiation by considering multiple frames, and it eliminates combinatorial search via fuzzy associations.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131724827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workstation-based phonetic typewriter","authors":"T. Kohonen","doi":"10.1109/NNSP.1991.239514","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239514","url":null,"abstract":"The author presents a general description of his 'phonetic typewriter' system that transcribes unlimited speech into orthographically correct text. The purpose of this paper is to motivate certain choices made in the partitioning of the problem into tasks and describe their implementation. The combination of algorithms he has selected has proven effective for well-articulated dictation in a phonemic language such as Finnish and Japanese, whereas for English and many other languages that are organized differently in the phonological sense, an optimal solution may look completely different.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131794641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On adaptive acquisition of spoken language","authors":"A. Gorin, S. Levinson, L. G. Miller, A. Gertner","doi":"10.1109/NNSP.1991.239499","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239499","url":null,"abstract":"At present, automatic speech recognition technology is based upon constructing models of the various levels of linguistic structure assumed to compose spoken language. These models are either constructed manually or automatically trained by example. A major impediment is the cost, or even the feasibility, of producing models of sufficient fidelity to enable the desired level of performance. The proposed alternative is to build a device capable of acquiring the necessary linguistic skills during the course of performing its task. The authors provide a progress report on their work in this direction, describing some principles and mechanisms upon which such a device might be based, and recounting several rudimentary experiments evaluating their utility. The basic principles and mechanisms underlying this research program are briefly reviewed. The authors have been investigating the application of those ideas to devices with spoken input, and which are capable of larger and more complex sets of actions. The authors propose some corollaries to those basic principles, thereby motivating extensions of earlier experimental mechanisms to these more complex devices. They also briefly describe these experimental systems and observe how they demonstrate the utility of their ideas.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124112635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaline with adaptive recursive memory","authors":"B. de Vries, J. Príncipe, P. Guedes de Oliveira","doi":"10.1109/NNSP.1991.239531","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239531","url":null,"abstract":"The authors present a generalization of Widrow's adaptive linear combiner with an adaptive recursive memory. Expressions for memory depth and resolution are derived. The LMS procedure is extended to adapt the memory depth and resolution so as to match the signal characteristics. The particular memory structure, gamma memory, was originally developed as part of a neural net model for temporal processing.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114250147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word recognition with the feature finding neural network (FFNN)","authors":"T. Gramß","doi":"10.1109/NNSP.1991.239513","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239513","url":null,"abstract":"An overview of the architecture and capabilities of the work recognizer FFNN ('feature finding neural network') is given. FFNN finds features in a self-organizing way which are relatively invariant in the presence of time distortions and changes in speaker characteristics. Fast and optimal feature selection rules have been developed to perform this task. With FFNN, essential problems of word recognition can be solved, among them a special case of the figure ground problem. FFNN is faster than the classical DTW and HMM recognizers and yields similar recognition rates.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131554329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Note on generalization, regularization and architecture selection in nonlinear learning systems","authors":"J. Moody","doi":"10.1109/NNSP.1991.239541","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239541","url":null,"abstract":"The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of parameters p/sub eff/( lambda ). GPE does not require the use of a test set or computationally intensive cross validation and generalizes previously proposed model selection criteria (such as GCV, FPE, AIC, and PSE) in that it is formulated to include biased, nonlinear models (such as back propagation networks) which may incorporate weight decay or other regularizers. The effective number of parameters p/sub eff/( lambda ) depends upon the amount of bias and smoothness (as determined by the regularization parameter lambda ) in the model, but generally differs from the number of weights p. Construction of an optimal architecture thus requires not just finding the weights w/sub lambda /* which minimize the training function U( lambda , w) but also the lambda which minimizes GPE( lambda ).<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New discriminative training algorithms based on the generalized probabilistic descent method","authors":"Shigeru Katagiri, C.-H. Lee, B. Juang","doi":"10.1109/NNSP.1991.239512","DOIUrl":"https://doi.org/10.1109/NNSP.1991.239512","url":null,"abstract":"The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134206763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}