Workshop on Spoken Language Technologies for Under-resourced Languages最新文献

筛选
英文 中文
Advances in Low Resource ASR: A Deep Learning Perspective 基于深度学习的低资源ASR研究进展
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-4
Hardik B. Sailor, Ankur T. Patil, H. Patil
{"title":"Advances in Low Resource ASR: A Deep Learning Perspective","authors":"Hardik B. Sailor, Ankur T. Patil, H. Patil","doi":"10.21437/SLTU.2018-4","DOIUrl":"https://doi.org/10.21437/SLTU.2018-4","url":null,"abstract":"Recently, developing Automatic Speech Recognition (ASR) systems for Low Resource (LR) languages is an active research area. The research in ASR is significantly advanced using deep learning approaches producing state-of-the-art results compared to the conventional approaches. However, it is still challenging to use such approaches for LR languages since it requires a huge amount of training data. Recently, data augmentation, multilingual and cross-lingual approaches, transfer learning, etc. enable training deep learning architectures. This paper presents an overview of deep learning-based approaches for building ASR for LR languages. Recent projects and events organized to support the development of ASR and related applications in this direction are also discussed. This paper could be a good motivation for the researchers interested to work towards low resource ASR using deep learning techniques. The approaches described here could be useful in other related applications, such as audio search.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mining Training Data for Language Modeling Across the World's Languages 跨世界语言的语言建模训练数据挖掘
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-13
Manasa Prasad, Theresa Breiner, D. Esch
{"title":"Mining Training Data for Language Modeling Across the World's Languages","authors":"Manasa Prasad, Theresa Breiner, D. Esch","doi":"10.21437/SLTU.2018-13","DOIUrl":"https://doi.org/10.21437/SLTU.2018-13","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"28 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load 基于功能负载的零资源环境下DPGMM聚类优化
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-1
Bin Wu, S. Sakti, Jinsong Zhang, Satoshi Nakamura
{"title":"Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load","authors":"Bin Wu, S. Sakti, Jinsong Zhang, Satoshi Nakamura","doi":"10.21437/SLTU.2018-1","DOIUrl":"https://doi.org/10.21437/SLTU.2018-1","url":null,"abstract":"Inspired by infant language acquisition, unsupervised subword discovery of zero-resource languages has gained attention recently. The Dirichlet Process Gaussian Mixture Model (DPGMM) achieves top results evaluated by the ABX discrimination test. However, the DPGMM model is too sensitive to acoustic variation and often produces too many types of subword units and a relatively high-dimensional posteriorgram, which implies high computational cost to perform learning and inference, as well as more tendency to be overfitting. This paper proposes applying functional load to reduce the number of sub-word units from DPGMM. We greedily merge pairs of units with the lowest functional load, causing the least information loss of the language. Results on the Xitsonga corpus with the official setting of Zerospeech 2015 show that we can reduce the number of sub-word units by more than two thirds without hurting the ABX error rate. The number of units is close to that of phonemes in human language.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115221893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Development of Assamese Continuous Speech Recognition System 阿萨姆语连续语音识别系统的开发
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-46
Tanmay Bhowmik, S. Mandal
{"title":"Development of Assamese Continuous Speech Recognition System","authors":"Tanmay Bhowmik, S. Mandal","doi":"10.21437/SLTU.2018-46","DOIUrl":"https://doi.org/10.21437/SLTU.2018-46","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"569 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123322906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Comparison of Features for Text-Independent Bengali Speaker Recognition 不依赖文本的孟加拉语说话人识别特征分析与比较
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-57
S. Das, P. Das
{"title":"Analysis and Comparison of Features for Text-Independent Bengali Speaker Recognition","authors":"S. Das, P. Das","doi":"10.21437/SLTU.2018-57","DOIUrl":"https://doi.org/10.21437/SLTU.2018-57","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128985535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Language Identification Using Stacked SDC Features and Residual Neural Network 基于堆叠SDC特征和残差神经网络的改进语言识别
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-44
R. Vuddagiri, Hari Krishna Vydana, A. Vuppala
{"title":"Improved Language Identification Using Stacked SDC Features and Residual Neural Network","authors":"R. Vuddagiri, Hari Krishna Vydana, A. Vuppala","doi":"10.21437/SLTU.2018-44","DOIUrl":"https://doi.org/10.21437/SLTU.2018-44","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125326657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages 信号处理线索改善低资源印度语言的自动语音识别
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-6
Arun Baby, S. KarthikPandiaD., H. Murthy
{"title":"Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages","authors":"Arun Baby, S. KarthikPandiaD., H. Murthy","doi":"10.21437/SLTU.2018-6","DOIUrl":"https://doi.org/10.21437/SLTU.2018-6","url":null,"abstract":"Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional flat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children's Speech JAMLIT:用于儿童语音自动识别的牙买加标准英语语料库
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-51
Stefan Watson, André Coy
{"title":"JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children's Speech","authors":"Stefan Watson, André Coy","doi":"10.21437/SLTU.2018-51","DOIUrl":"https://doi.org/10.21437/SLTU.2018-51","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133130077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical Study of Speech Synthesis Markup Language and Its Implementation for Punjabi Language 旁遮普语语音合成标记语言的实证研究及实现
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-22
Atul Kumar, S. Agrawal
{"title":"Empirical Study of Speech Synthesis Markup Language and Its Implementation for Punjabi Language","authors":"Atul Kumar, S. Agrawal","doi":"10.21437/SLTU.2018-22","DOIUrl":"https://doi.org/10.21437/SLTU.2018-22","url":null,"abstract":"This paper builds a prioritized list of requirements for speech synthesis markup which any proposed markup language should address. This study presents requirements and essential tags for specification development of Punjabi Language. A speech synthesizer works like written text into correct sounds to be spoken. To do this it uses an SSML document and one or more lexicons and dictionaries. We have presented how the different type of modules in TTS System helps to convert a text input of SSML document to spoken form in Punjabi Language. Since, Punjabi is the morphological rich Language, it is written in \"Gurumukhi\" Script and this is the official Language of Govt. of India. So, hence accordingly in this language Homograph problem will not occur. Tones in Punjabi pose big problems. The words written in similar ways, have different tones and there by changes their meanings for which the tags have been designed separately. In Punjabi orthographically the written symbols exactly corresponds to the specific words. Therefore in Punjabi, we do not any word which may be called Homograph.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122975167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator 基于马拉地语语音计算器的低资源文本转语音系统级联技术实现
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-16
Monica R. Mundada, Sangramsing Kayte, P. Das
{"title":"Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator","authors":"Monica R. Mundada, Sangramsing Kayte, P. Das","doi":"10.21437/SLTU.2018-16","DOIUrl":"https://doi.org/10.21437/SLTU.2018-16","url":null,"abstract":"The indulgent acquaintance of mathematical basic concepts creates the pavement for numerous opportunities in life for every individual, including visually impaired people. The use of assertive technology for the disabled section of the society makes them more independent and avoid barriers in the field of education and employment. This research is focused to design an Android-based application i.e. talking Calculator for low resource based Marathi native language. The novelty of this work is to develop both, the application and the Marathi number corpus. Marathi is an Indo-Aryan language spoken by approximately 6.99 million speakers in India, which is the third widely spoken language after Bengali and Telugu but as they lack in linguistic resources, e.g. grammars, POS taggers, corpora, it falls into the category of low resource languages. The front end part of the application depicts the screen of a basic calculator with numerals displayed in Marathi. During runtime, each number is spoken as the specific key is pressed. It also speaks out the operation which is intended to be performed. The concatenation synthesis technique is applied to speak out the value of decimal places in the output number. The result is spoken out with proper place value of a digit in Marathi. The performance of the system is measured to the accuracy rate of 95.5%. The average run time complexity of the application is also calculated which is noted down to 2.64 sec. The feedback and review of the application is also taken from real end-user i.e. blind people.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信