B. Mumtaz, Sahar Rauf, Hafsa Qadir, J. Khalid, T. Habib, S. Hussain, Rukhsana Barkat, E. Haq
{"title":"巴基斯坦银行业乌尔都语语料库","authors":"B. Mumtaz, Sahar Rauf, Hafsa Qadir, J. Khalid, T. Habib, S. Hussain, Rukhsana Barkat, E. Haq","doi":"10.1109/ICSDA.2018.8693010","DOIUrl":null,"url":null,"abstract":"This research describes an effort to build Urdu speech corpora for the banking sector in Pakistan. We have designed speech corpora to develop debit card activation ASR and these corpora are comprised of eight types of corpora mainly debit card number corpus, expiry date corpus, last four digit corpus, months' name, date of birth corpus, account type and Urdu-counting corpus. These corpora contain telephone speech in read style obtained from more than 400 speakers specifically in Punjabi accent in both outdoor and indoor environments, including offices, homes, banks, and universities. The speech is automatically annotated and manually verified at sentence tier and reports 98% inter-annotator accuracy. In this paper, we report the design, recording and annotation process of speech corpora that serve as a data development step for ASR, and will be integrated in debit card activation service in banking sector of Pakistan.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"URDU Speech Corpora for Banking Sector in Pakistan\",\"authors\":\"B. Mumtaz, Sahar Rauf, Hafsa Qadir, J. Khalid, T. Habib, S. Hussain, Rukhsana Barkat, E. Haq\",\"doi\":\"10.1109/ICSDA.2018.8693010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research describes an effort to build Urdu speech corpora for the banking sector in Pakistan. We have designed speech corpora to develop debit card activation ASR and these corpora are comprised of eight types of corpora mainly debit card number corpus, expiry date corpus, last four digit corpus, months' name, date of birth corpus, account type and Urdu-counting corpus. These corpora contain telephone speech in read style obtained from more than 400 speakers specifically in Punjabi accent in both outdoor and indoor environments, including offices, homes, banks, and universities. The speech is automatically annotated and manually verified at sentence tier and reports 98% inter-annotator accuracy. In this paper, we report the design, recording and annotation process of speech corpora that serve as a data development step for ASR, and will be integrated in debit card activation service in banking sector of Pakistan.\",\"PeriodicalId\":303819,\"journal\":{\"name\":\"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2018.8693010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2018.8693010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
URDU Speech Corpora for Banking Sector in Pakistan
This research describes an effort to build Urdu speech corpora for the banking sector in Pakistan. We have designed speech corpora to develop debit card activation ASR and these corpora are comprised of eight types of corpora mainly debit card number corpus, expiry date corpus, last four digit corpus, months' name, date of birth corpus, account type and Urdu-counting corpus. These corpora contain telephone speech in read style obtained from more than 400 speakers specifically in Punjabi accent in both outdoor and indoor environments, including offices, homes, banks, and universities. The speech is automatically annotated and manually verified at sentence tier and reports 98% inter-annotator accuracy. In this paper, we report the design, recording and annotation process of speech corpora that serve as a data development step for ASR, and will be integrated in debit card activation service in banking sector of Pakistan.