Workshop on Spoken Language Technologies for Under-resourced Languages最新文献

筛选
英文 中文
Visually Grounded Cross-Lingual Keyword Spotting in Speech 基于视觉的跨语言关键字识别
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-53
H. Kamper, Michael Roth
{"title":"Visually Grounded Cross-Lingual Keyword Spotting in Speech","authors":"H. Kamper, Michael Roth","doi":"10.21437/SLTU.2018-53","DOIUrl":"https://doi.org/10.21437/SLTU.2018-53","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Prosodic Analysis of Non-Native South Indian English Speech 非母语南印度英语语音的韵律分析
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-15
Radha Krishna Guntur, R. Krishnan, V. K. Mittal
{"title":"Prosodic Analysis of Non-Native South Indian English Speech","authors":"Radha Krishna Guntur, R. Krishnan, V. K. Mittal","doi":"10.21437/SLTU.2018-15","DOIUrl":"https://doi.org/10.21437/SLTU.2018-15","url":null,"abstract":"Investigations on linguistic prosody related to non-native English speech by South Indians were carried out using a database specifically meant for this study. Prosodic differences between native and non-native speech samples of regional language groups: Kannada, Tamil, and Telugu were evaluated and compared. This information is useful in applications such as Native language identification. It is observed that the mean value of pitch, and the general variation of pitch contour is higher in the case of non-native English speech by all the three groups of speakers, indicating accommodation of speaking manner. This study finds that dynamic variation of pitch is the least for English speech by native Kannada language speakers. The increase in standard deviation of pitch contour for non-native English speech by Kannada speakers is much less at about 3.7% on an average. In the case of Tamil and Telugu native speakers, it is 9.5%, and 27% respectively.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127091112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Post-Processing Using Speech Enhancement Techniques for Unit Selection and Hidden Markov Model Based Low Resource Language Marathi Text-to-Speech System 基于单元选择的语音增强后处理和基于隐马尔可夫模型的低资源语言马拉地语文本到语音系统
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-20
Sangramsing Kayte, Monica R. Mundada
{"title":"Post-Processing Using Speech Enhancement Techniques for Unit Selection and Hidden Markov Model Based Low Resource Language Marathi Text-to-Speech System","authors":"Sangramsing Kayte, Monica R. Mundada","doi":"10.21437/SLTU.2018-20","DOIUrl":"https://doi.org/10.21437/SLTU.2018-20","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121324401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IIITH-ILSC Speech Database for Indain Language Identification iith - ilsc印度语言识别语音数据库
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-12
R. Vuddagiri, K. Gurugubelli, P. Jain, Hari Krishna Vydana, A. Vuppala
{"title":"IIITH-ILSC Speech Database for Indain Language Identification","authors":"R. Vuddagiri, K. Gurugubelli, P. Jain, Hari Krishna Vydana, A. Vuppala","doi":"10.21437/SLTU.2018-12","DOIUrl":"https://doi.org/10.21437/SLTU.2018-12","url":null,"abstract":"This work focuses on the development of speech data comprising 23 Indian languages for developing language identification (LID) systems. Large data is a pre-requisite for developing state-of-the-art LID systems. With this motivation, the task of developing multilingual speech corpus for Indian languages has been initiated. This paper describes the composition of the data and the performances of various LID systems developed using this data. In this paper, Mel frequency cepstral feature representation is used for language identification. In this work, various state-of-the-art LID systems are developed using i-vectors, deep neural network (DNN) and deep neural network with attention (DNN-WA) models. The performance of the LID system is observed in terms of the equal error rate for i-vector, DNN and DNN-WA is 17.77%, 17.95%, and 15.18% respec-tively. Deep neural network with attention model shows a better performance over i-vector and DNN models.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"86 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116421005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Human Quality Text to Speech System for Sinhala 僧伽罗语人性化文本转语音系统
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-33
L. Nanayakkara, Chamila Liyanage, Pubudu Tharaka Viswakula, Thilini Nagungodage, Randil Pushpananda, R. Weerasinghe
{"title":"A Human Quality Text to Speech System for Sinhala","authors":"L. Nanayakkara, Chamila Liyanage, Pubudu Tharaka Viswakula, Thilini Nagungodage, Randil Pushpananda, R. Weerasinghe","doi":"10.21437/SLTU.2018-33","DOIUrl":"https://doi.org/10.21437/SLTU.2018-33","url":null,"abstract":"This paper proposes an approach on implementing a Text to Speech system for Sinhala language using MaryTTS framework. In this project, a set of rules for mapping text to sound were identified and proceeded with Unit selection mechanism. The datasets used for this study were gathered from newspaper articles and the corresponding sentences were recorded by a professional speaker. User level evaluation was conducted with 20 candidates, where the intelligibility and the naturalness of the developed Sinhala TTS system received an approximate score of 70%. And the overall speech quality is an approximately to 60%.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123686298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Predicting the Features of World Atlas of Language Structures from Speech 从言语预测世界语言结构地图集的特征
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-52
Alexander Gutkin, Tatiana Merkulova, Martin Jansche
{"title":"Predicting the Features of World Atlas of Language Structures from Speech","authors":"Alexander Gutkin, Tatiana Merkulova, Martin Jansche","doi":"10.21437/SLTU.2018-52","DOIUrl":"https://doi.org/10.21437/SLTU.2018-52","url":null,"abstract":"Recent work considered how images paired with speech can be used as supervision for building speech systems when transcriptions are not available. We ask whether visual grounding can be used for cross-lingual keyword spotting: given a text keyword in one language, the task is to retrieve spoken utterances containing that keyword in another language. This could enable searching through speech in a low-resource language using text queries in a high-resource language. As a proof-of-concept, we use English speech with German queries: we use a German visual tagger to add keyword labels to each training image, and then train a neural network to map English speech to German keywords. Without seeing parallel speech-transcriptions or translations, the model achieves a precision at ten of 58%. We show that most erroneous retrievals contain equivalent or semantically relevant keywords; excluding these would improve P@10 to 91%.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-resource Tibetan Dialect Acoustic Modeling Based on Transfer Learning 基于迁移学习的低资源藏语方言声学建模
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-2
Jinghao Yan, Zhiqiang Lv, Shen Huang, Hongzhi Yu
{"title":"Low-resource Tibetan Dialect Acoustic Modeling Based on Transfer Learning","authors":"Jinghao Yan, Zhiqiang Lv, Shen Huang, Hongzhi Yu","doi":"10.21437/SLTU.2018-2","DOIUrl":"https://doi.org/10.21437/SLTU.2018-2","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125047378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Incorporating Speaker Normalizing Capabilities to an End-to-End Speech Recognition System 结合说话人规范化能力到端到端语音识别系统
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI: 10.21437/sltu.2018-36
Hari Krishna Vydana, Sivanand Achanta, A. Vuppala
{"title":"Incorporating Speaker Normalizing Capabilities to an End-to-End Speech Recognition System","authors":"Hari Krishna Vydana, Sivanand Achanta, A. Vuppala","doi":"10.21437/sltu.2018-36","DOIUrl":"https://doi.org/10.21437/sltu.2018-36","url":null,"abstract":"Speaker normalization is one of the crucial aspects of an Automatic speech recognition system (ASR). Speaker normalization is employed to reduce the performance drop in ASR due to speaker variabilities. Traditional speaker normalization methods are mostly linear transforms over the input data estimated per speaker, such transforms would be efficient with sufficient data. In practical scenarios, only a single utterance from the test speaker is accessible. The present study explores speaker normalization methods for end-to-end speech recognition systems that could efficiently be performed even when single utterance from the unseen speaker is available. In this work, it is hypothesized that by suitably providing information about the speaker’s identity while training an end-to-end neural network, the capability to normalize the speaker variability could be in-corporated into an ASR system. The efficiency of these normalization methods depends on the representation used for unseen speakers. In this work, the identity of the training speaker is represented in two different ways viz. i) by using a one-hot speaker code, ii) a weighted combination of all the training speakers identities. The unseen speakers from the test set are represented using a weighted combination of training speakers representations. Both the approaches have reduced the word error rate (WER) by 0.6, 1.3% WSJ corpus.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126785093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A small Griko-Italian speech translation corpus 一个小的griko -意大利语语音翻译语料库
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-07-27 DOI: 10.21437/SLTU.2018-8
Marcely Zanon Boito, Antonios Anastasopoulos, M. Lekakou, A. Villavicencio, L. Besacier
{"title":"A small Griko-Italian speech translation corpus","authors":"Marcely Zanon Boito, Antonios Anastasopoulos, M. Lekakou, A. Villavicencio, L. Besacier","doi":"10.21437/SLTU.2018-8","DOIUrl":"https://doi.org/10.21437/SLTU.2018-8","url":null,"abstract":"This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research. The corpus consists of 330 utterances (about 2 hours of speech) which have been transcribed and translated in Italian, with annotations for word-level speech-to-transcription and speech-to-translation alignments. The corpus also includes morpho syntactic tags and word-level glosses. Applying an automatic unit discovery method, pseudo-phones were also generated. We detail how the corpus was collected, cleaned and processed, and we illustrate its use on zero-resource tasks by presenting some baseline results for the task of speech-to-translation alignment and unsupervised word discovery. The dataset will be available online, aiming to encourage replicability and diversity in computational language documentation experiments.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129372206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Automatic Speech Recognition for Humanitarian Applications in Somali 索马里人道主义应用的自动语音识别
Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-07-23 DOI: 10.21437/SLTU.2018-5
Raghav Menon, A. Biswas, A. Saeb, John Quinn, T. Niesler
{"title":"Automatic Speech Recognition for Humanitarian Applications in Somali","authors":"Raghav Menon, A. Biswas, A. Saeb, John Quinn, T. Niesler","doi":"10.21437/SLTU.2018-5","DOIUrl":"https://doi.org/10.21437/SLTU.2018-5","url":null,"abstract":"We present our first efforts in building an automatic speech recognition system for Somali, an under-resourced language, using 1.57 hrs of annotated speech for acoustic model training. The system is part of an ongoing effort by the United Nations (UN) to implement keyword spotting systems supporting humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We evaluate several types of acoustic model, including recent neural architectures. Language model data augmentation using a combination of recurrent neural networks (RNN) and long short-term memory neural networks (LSTMs) as well as the perturbation of acoustic data are also considered. We find that both types of data augmentation are beneficial to performance, with our best system using a combination of convolutional neural networks (CNNs), time-delay neural networks (TDNNs) and bi-directional long short term memory (BLSTMs) to achieve a word error rate of 53.75%.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信