{"title":"Corpus design and development of an annotated speech database for Punjabi","authors":"S. Bansal, S. Sharan, S. Agrawal","doi":"10.1109/ICSDA.2015.7357860","DOIUrl":null,"url":null,"abstract":"Punjabi is an important Indo-Aryan languages spoken in India and in some other countries especially Pakistan. It is a tonal language and its phonetic and phonological aspects have not been studied very much. The present paper reports development of phonemically annotated speech database of Malwai dialect of Punjabi. A phonetically rich text database of 1500 words and 300 sentences from a corpus of about 300,000 words was created. These were recorded by 25 male and 25 female speaker format with sampling rate of 16 kHz and 16 bit. The recordings were made in the native places of speakers possessing the original version the Malwai dialect of Punjabi. The recorded data was segmented and labeled phonemically to get the phonemic and sub-phonemic elements of each phoneme and the tonemes of Punjabi language. The annotated database can be useful for phonetic studies and to develop Punjabi speech synthesis system.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2015.7357860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Punjabi is an important Indo-Aryan languages spoken in India and in some other countries especially Pakistan. It is a tonal language and its phonetic and phonological aspects have not been studied very much. The present paper reports development of phonemically annotated speech database of Malwai dialect of Punjabi. A phonetically rich text database of 1500 words and 300 sentences from a corpus of about 300,000 words was created. These were recorded by 25 male and 25 female speaker format with sampling rate of 16 kHz and 16 bit. The recordings were made in the native places of speakers possessing the original version the Malwai dialect of Punjabi. The recorded data was segmented and labeled phonemically to get the phonemic and sub-phonemic elements of each phoneme and the tonemes of Punjabi language. The annotated database can be useful for phonetic studies and to develop Punjabi speech synthesis system.