{"title":"Implementation of dictation system for Malayalam office document","authors":"P. Devi, J. Stephen, G. S. Kurambath, R. Kumar","doi":"10.1145/2345396.2345521","DOIUrl":null,"url":null,"abstract":"This paper describes the implementation of a dictation system for Malayalam office documents in OpenOffice Writer. Dictation system is built using state-of-the-art large vocabulary continuous speech recognition system for the Malayalam language. This system supports a vocabulary of 5000 most commonly used office domain words and is employed with a vocabulary updating facility to handle out-of-vocabulary words. The system is based on Hidden Markov Model (HMM), trained with huge (25 hours) amount of data. The training data is collected in room environment, ensuring the speaker variance and the phonetic richness. A hybrid model which integrates the rule based method with statistical method is used to handle the pronunciation variations for the creation of the pronunciation dictionary. The system is first of its kind which simplifies the tedious task of typing in Malayalam. Apart from dictating office documents with 75 ±5 % accuracy, the system is equipped with a facility of suggestion generation by which the user will be provided with alternate words for mis-recognized words. The system also supports some basic voice command operations for file operations like open, save, close etc. This system has an option to adapt to the user's voice which will improve the recognition accuracy by 2-5%. The system is successfully implemented in OpenOffice Writer and tested.","PeriodicalId":290400,"journal":{"name":"International Conference on Advances in Computing, Communications and Informatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Advances in Computing, Communications and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2345396.2345521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper describes the implementation of a dictation system for Malayalam office documents in OpenOffice Writer. Dictation system is built using state-of-the-art large vocabulary continuous speech recognition system for the Malayalam language. This system supports a vocabulary of 5000 most commonly used office domain words and is employed with a vocabulary updating facility to handle out-of-vocabulary words. The system is based on Hidden Markov Model (HMM), trained with huge (25 hours) amount of data. The training data is collected in room environment, ensuring the speaker variance and the phonetic richness. A hybrid model which integrates the rule based method with statistical method is used to handle the pronunciation variations for the creation of the pronunciation dictionary. The system is first of its kind which simplifies the tedious task of typing in Malayalam. Apart from dictating office documents with 75 ±5 % accuracy, the system is equipped with a facility of suggestion generation by which the user will be provided with alternate words for mis-recognized words. The system also supports some basic voice command operations for file operations like open, save, close etc. This system has an option to adapt to the user's voice which will improve the recognition accuracy by 2-5%. The system is successfully implemented in OpenOffice Writer and tested.