{"title":"TREN- Turkish recognition engine for distributed applications","authors":"H. Palaz, A. Kanak, Yücel Bicil, M. U. Dogan","doi":"10.1109/ISSPA.2005.1581007","DOIUrl":null,"url":null,"abstract":"Turkish Recognition ENgine (TREN) is a modular, Hid- den Markov Model based (HMM-based), speaker inde- pendent and Distributed Component Object Model based (DCOM-based) speech recognition system. TREN is a two-layered system containing specialized modules that allow a fully interoperable platform including a Turkish speech recognizer, a feature extractor, an end-point de- tector and a performance monitoring module. In order to increase the recognition performance, a Turkish telephony speech database with a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and a multithreaded speech recognizer with dynamic load sharing facilities. For the complex speech processing systems, a layered ar- chitecture which is a natural outgrowth of the client-server model, could be an effective solution concerning the prob- lems such as lack of scalability and portability. Compared with the traditional client-server model, layered architec- ture of TREN offers a natural way to separate user inter- face from the background of the hard work performed by the recognizer. TREN is composed of two layers: Cen- tral server (CS) constitutes the first layer of the system which is subjected to apply some speech processing rou- tines (feature extraction and end-point detection) to the audio files collected as an input from third party appli- cations. CS is also responsible for the authorization of a remote server (RS) with the least CPU load of the recogni- tion process (LP CP U ) as compared to the other RSs all of which constitute the second layer of TREN. Once this au- thorization is accomplished the selected RS will become ready to serve as a recognizer. This two-layered architec- ture allows RSs work in a parallel and distributed manner. Note that this architecture also gives a flexibility to in- stall or uninstall any number of machines according to the application requirements. TREN supports up to 64 simul- taneous recognitions resembling a 64-channel system.","PeriodicalId":385337,"journal":{"name":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.2005.1581007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Turkish Recognition ENgine (TREN) is a modular, Hid- den Markov Model based (HMM-based), speaker inde- pendent and Distributed Component Object Model based (DCOM-based) speech recognition system. TREN is a two-layered system containing specialized modules that allow a fully interoperable platform including a Turkish speech recognizer, a feature extractor, an end-point de- tector and a performance monitoring module. In order to increase the recognition performance, a Turkish telephony speech database with a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and a multithreaded speech recognizer with dynamic load sharing facilities. For the complex speech processing systems, a layered ar- chitecture which is a natural outgrowth of the client-server model, could be an effective solution concerning the prob- lems such as lack of scalability and portability. Compared with the traditional client-server model, layered architec- ture of TREN offers a natural way to separate user inter- face from the background of the hard work performed by the recognizer. TREN is composed of two layers: Cen- tral server (CS) constitutes the first layer of the system which is subjected to apply some speech processing rou- tines (feature extraction and end-point detection) to the audio files collected as an input from third party appli- cations. CS is also responsible for the authorization of a remote server (RS) with the least CPU load of the recogni- tion process (LP CP U ) as compared to the other RSs all of which constitute the second layer of TREN. Once this au- thorization is accomplished the selected RS will become ready to serve as a recognizer. This two-layered architec- ture allows RSs work in a parallel and distributed manner. Note that this architecture also gives a flexibility to in- stall or uninstall any number of machines according to the application requirements. TREN supports up to 64 simul- taneous recognitions resembling a 64-channel system.