G. Deekshitha, K. Sreelakshmi, Ben P. Babu, L. Mary
{"title":"Development of Spoken Story Database in Malayalam Language","authors":"G. Deekshitha, K. Sreelakshmi, Ben P. Babu, L. Mary","doi":"10.1109/ICEES.2018.8442342","DOIUrl":null,"url":null,"abstract":"This paper discusses about the development of a story database in Malayalam language. Malayalam is predominantly spoken only by the natives of Kerala state of Indian sub-continent. Due to lack of transcribed audio data, it can be considered as a low resource language. So development of such a database will aid the researchers working in areas such as Malayalam speech recognition, keyword spotting, and speaker recognition. The database contains 116 Malayalam short stories for children, along with transcriptions in Malayalam. The design considerations taken while collecting prosodically rich speech data are mentioned in this paper. Using this database, a Malayalam speech recognition system is implemented using eMU-Sphinx. This paper presents the details of development of a continuous, context independent, speech recognition system for Malayalam language using collected spoken story database.","PeriodicalId":134828,"journal":{"name":"2018 4th International Conference on Electrical Energy Systems (ICEES)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Energy Systems (ICEES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEES.2018.8442342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper discusses about the development of a story database in Malayalam language. Malayalam is predominantly spoken only by the natives of Kerala state of Indian sub-continent. Due to lack of transcribed audio data, it can be considered as a low resource language. So development of such a database will aid the researchers working in areas such as Malayalam speech recognition, keyword spotting, and speaker recognition. The database contains 116 Malayalam short stories for children, along with transcriptions in Malayalam. The design considerations taken while collecting prosodically rich speech data are mentioned in this paper. Using this database, a Malayalam speech recognition system is implemented using eMU-Sphinx. This paper presents the details of development of a continuous, context independent, speech recognition system for Malayalam language using collected spoken story database.