TREN- Turkish recognition engine for distributed applications

Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005. Pub Date : 2005-08-28 DOI:10.1109/ISSPA.2005.1581007

H. Palaz, A. Kanak, Yücel Bicil, M. U. Dogan

{"title":"TREN- Turkish recognition engine for distributed applications","authors":"H. Palaz, A. Kanak, Yücel Bicil, M. U. Dogan","doi":"10.1109/ISSPA.2005.1581007","DOIUrl":null,"url":null,"abstract":"Turkish Recognition ENgine (TREN) is a modular, Hid- den Markov Model based (HMM-based), speaker inde- pendent and Distributed Component Object Model based (DCOM-based) speech recognition system. TREN is a two-layered system containing specialized modules that allow a fully interoperable platform including a Turkish speech recognizer, a feature extractor, an end-point de- tector and a performance monitoring module. In order to increase the recognition performance, a Turkish telephony speech database with a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and a multithreaded speech recognizer with dynamic load sharing facilities. For the complex speech processing systems, a layered ar- chitecture which is a natural outgrowth of the client-server model, could be an effective solution concerning the prob- lems such as lack of scalability and portability. Compared with the traditional client-server model, layered architec- ture of TREN offers a natural way to separate user inter- face from the background of the hard work performed by the recognizer. TREN is composed of two layers: Cen- tral server (CS) constitutes the first layer of the system which is subjected to apply some speech processing rou- tines (feature extraction and end-point detection) to the audio files collected as an input from third party appli- cations. CS is also responsible for the authorization of a remote server (RS) with the least CPU load of the recogni- tion process (LP CP U ) as compared to the other RSs all of which constitute the second layer of TREN. Once this au- thorization is accomplished the selected RS will become ready to serve as a recognizer. This two-layered architec- ture allows RSs work in a parallel and distributed manner. Note that this architecture also gives a flexibility to in- stall or uninstall any number of machines according to the application requirements. TREN supports up to 64 simul- taneous recognitions resembling a 64-channel system.","PeriodicalId":385337,"journal":{"name":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.2005.1581007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Turkish Recognition ENgine (TREN) is a modular, Hid- den Markov Model based (HMM-based), speaker inde- pendent and Distributed Component Object Model based (DCOM-based) speech recognition system. TREN is a two-layered system containing specialized modules that allow a fully interoperable platform including a Turkish speech recognizer, a feature extractor, an end-point de- tector and a performance monitoring module. In order to increase the recognition performance, a Turkish telephony speech database with a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and a multithreaded speech recognizer with dynamic load sharing facilities. For the complex speech processing systems, a layered ar- chitecture which is a natural outgrowth of the client-server model, could be an effective solution concerning the prob- lems such as lack of scalability and portability. Compared with the traditional client-server model, layered architec- ture of TREN offers a natural way to separate user inter- face from the background of the hard work performed by the recognizer. TREN is composed of two layers: Cen- tral server (CS) constitutes the first layer of the system which is subjected to apply some speech processing rou- tines (feature extraction and end-point detection) to the audio files collected as an input from third party appli- cations. CS is also responsible for the authorization of a remote server (RS) with the least CPU load of the recogni- tion process (LP CP U ) as compared to the other RSs all of which constitute the second layer of TREN. Once this au- thorization is accomplished the selected RS will become ready to serve as a recognizer. This two-layered architec- ture allows RSs work in a parallel and distributed manner. Note that this architecture also gives a flexibility to in- stall or uninstall any number of machines according to the application requirements. TREN supports up to 64 simul- taneous recognitions resembling a 64-channel system.

查看原文本刊更多论文

用于分布式应用程序的土耳其语识别引擎

土耳其语识别引擎(TREN)是一个模块化的、基于隐马尔可夫模型(hmm)、独立于说话者和基于分布式组件对象模型(dcom)的语音识别系统。TREN是一个两层系统，包含专门的模块，允许一个完全可互操作的平台，包括一个土耳其语语音识别器，一个特征提取器，一个端点检测器和一个性能监控模块。为了提高识别性能，本文收集了一个具有非常大语料库的土耳其语语音数据库，并在统计上考察了代表土耳其语的三音的最宽跨度。TREN已被用于辅助语音技术，这需要一个模块化和多线程语音识别器与动态负载共享设施。对于复杂的语音处理系统，客户端-服务器模型的自然产物——分层语音体系结构可以有效地解决诸如缺乏可扩展性和可移植性等问题。与传统的客户端-服务器模型相比，TREN的分层体系结构提供了一种自然的方法，将用户界面与识别器执行的繁重工作的背景分离开来。TREN由两层组成:中央服务器(CS)是系统的第一层，它对从第三方应用程序收集的作为输入的音频文件进行一些语音处理程序(特征提取和端点检测)。与其他RSs相比，CS还负责对具有最小CPU负载的识别过程(LP CPU)的远程服务器(RS)进行授权，所有这些都构成了TREN的第二层。一旦此授权完成，所选的RS将准备好作为识别者。这种两层架构允许RSs以并行和分布式的方式工作。请注意，此体系结构还提供了根据应用程序需求安装或卸载任意数量机器的灵活性。TREN支持多达64个类似64通道系统的同时识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.

自引率

0.00%

发文量