2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献

筛选
英文 中文
Indian Languages Corpus for Speech Recognition 印度语言语料库语音识别
Joyanta Basu, Soma Khan, Rajib Roy, Babita Saxena, Dipankar Ganguly, Sunita Arora, K. Arora, S. Bansal, S. Agrawal
{"title":"Indian Languages Corpus for Speech Recognition","authors":"Joyanta Basu, Soma Khan, Rajib Roy, Babita Saxena, Dipankar Ganguly, Sunita Arora, K. Arora, S. Bansal, S. Agrawal","doi":"10.1109/O-COCOSDA46868.2019.9041171","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041171","url":null,"abstract":"Robust Speech Recognition System for various languages have transcended beyond research labs to commercial products. It has been possible owing to the major developments in the area of machine learning, especially deep learning. However, development of advanced speech recognition systems could be leveraged only with the availability of specially curetted speech data. Such systems having usable quality are yet to be developed for most of the Indian languages. The present paper describes the design and development of a standard speech corpora which can be used for developing general purpose ASR systems and benchmarking them. This database has been developed for Indian languages namely Hindi, Bengali and Indian English. The corpus design incorporates important parameters such as phonetic coverage and distribution. The data was recorded by 1500 speakers in each language by male and female speakers of different age groups in varying environments. The data was recorded on a server using online recording system and transcribed using semi-automatic tools. The paper describes the corpus designing methodology, challenges faced and approach adopted to overcome them. The whole process of designing speech database has been generic enough to be used for other languages as well.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Acquisition of english retroflex vowel [3] by EFL learners from Chinese dialectal regions- A case study of Beijing and Changsha 汉语方言地区英语学习者英语反折元音习得[3]——以北京和长沙为例
Bin Li, Yuan Jia
{"title":"Acquisition of english retroflex vowel [3] by EFL learners from Chinese dialectal regions- A case study of Beijing and Changsha","authors":"Bin Li, Yuan Jia","doi":"10.1109/O-COCOSDA46868.2019.9060843","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060843","url":null,"abstract":"This paper investigates, through an extensive acoustic analysis, the acquisition of English retroflex vowel [3] by learners of English as Foreign Language (EFL) from Beijing (BJ) and Changsha (CS), which are representative dialectal regions in the north and south China respectively. In our analysis, formant and duration were selected as parameters. The results demonstrate that all the EFL learners involved in the study produced the onset target and offset target of [3] with a more backward tendency. For formant patterns, both native speakers and EFL learners present a similar tendency, namely the decline of F3 and the rise of F2. For CS speakers, due to the effect of their mother tongue, their F3 falls more slowly. Moreover, from the spectral perspective, the F3 changing rate of CS male learners is significantly smaller than that of native speakers. On the other hand, BJ learners, especially female learners, show more obvious changes in F3 than native speakers. In addition, we speculate that the language background and gender can affect the acquisition of retroflex vowels.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phoneme-level speaking rate variation on waveform generation using GAN-TTS 基于GAN-TTS的音素级说话速率变化波形生成
Mayuko Okamato, S. Sakti, Satoshi Nakamura
{"title":"Phoneme-level speaking rate variation on waveform generation using GAN-TTS","authors":"Mayuko Okamato, S. Sakti, Satoshi Nakamura","doi":"10.1109/O-COCOSDA46868.2019.9060845","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060845","url":null,"abstract":"The development of text-to-speech synthesis (TTS) systems continues to advance, and the naturalness of their generated speech has significantly improved. But most TTS systems now learn from data using a deep learning framework and generate the output at a monotonous speaking rate. In contrast humans vary their speaking rates and tend to slow down to emphasize words to distinguish elements of focus in an utterance.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129311620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion 用音节语调预测泰文字素到音素转换中WER的大幅减少
S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai
{"title":"A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion","authors":"S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai","doi":"10.1109/O-COCOSDA46868.2019.9041212","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041212","url":null,"abstract":"Thai toneme prediction has been one of the greatest difficulties for Thai grapheme to phoneme conversion (G2P). This paper presents an improvement in the prediction of linguistic features in terms of tone rules. Among these, there will always be exceptions, for example, the tones used in loan words and transliterated words, which are usually adopted from the original language. This paper does not concern itself with the transliteration problem, but aims to show the success of a method which uses an automatic toneme predictor based on the tone rules of Thai pronunciation for the development of a machine learning model. The proposed method attaches a predictor to the final stage of converting a grapheme to a phoneme. Furthermore, this work also explores end-to-end prediction using Long Short Term Memories (LSTM) that takes its input sequence from the National Electronic and Computer Technology Center's Pseudo-Syllable segmentation and alignment tool. An evaluation was conducted to show the success of the proposed system, and also to compare the results with our traditional end-to-end sequence-to-sequence G2P. A comparison of the results shows that sequence-to-sequence modeling obtains the lowest Word Error Rate at 1.6%, and the proposed system works well on a 2018 small device (Raspberry Pi).","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary 大型缅甸语语音词典中字素到音素转换的序列到序列模型
Aye Mya Hlaing, Win Pa Pa
{"title":"Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary","authors":"Aye Mya Hlaing, Win Pa Pa","doi":"10.1109/O-COCOSDA46868.2019.9041225","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041225","url":null,"abstract":"Grapheme to phoneme conversion is the production of pronunciation for a given word. Neural sequence to sequence models have been applied for grapheme to phoneme conversion recently. This paper analyzes the effectiveness of neural sequence to sequence models in grapheme to phoneme conversion for Myanmar language. The first large Myanmar pronunciation dictionary is introduced, and it is applied in building sequence to sequence models. The performance of four grapheme to phoneme conversion models, joint sequence model, Transformer, simple encoder-decoder, and attention enabled encoder-decoder models, are evaluated in terms of phoneme error rate(PER) and word error rate(WER). Analysis on three-word classes and six phoneme error types are done and discussed details in this paper. According to the evaluations, the Transformer has comparable results to traditional joint sequence model.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125820323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
index 指数
{"title":"index","authors":"","doi":"10.1109/o-cocosda46868.2019.9041241","DOIUrl":"https://doi.org/10.1109/o-cocosda46868.2019.9041241","url":null,"abstract":"","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotation and preliminary analysis of utterance decontextualization in a multiactivity 多活动中话语去语境化的注释与初步分析
Haruka Amatani, Yayoi Tanaka
{"title":"Annotation and preliminary analysis of utterance decontextualization in a multiactivity","authors":"Haruka Amatani, Yayoi Tanaka","doi":"10.1109/O-COCOSDA46868.2019.9041203","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041203","url":null,"abstract":"How are conversations decontextualized apart from the here-and-now situation in a daily joint activity? More specifically, how are those (de/)contextualized utterances associated with movements in the activity? Applying Cloran's [1] Rhetoric Units, we identified the degrees of decontextualization for utterances, regarding their time and space distances from the ongoing situation. For the annotation of hand and body movements, we employed Kendon's [2] gesture phases. The association of speech and movements were examined using the degrees of decontextualization and movement phases. The results from the preliminary analysis suggested that when participants were pausing their movements they tend to utter in the high degrees of decontextualization than when they were moving.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"35 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120982111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-year-old children's production of native mandarin Chinese lexical tones 三岁儿童产生的地道汉语词汇声调
Ao Chen, Hintat Cheung, Yuchen Li, Liqun Gao
{"title":"Three-year-old children's production of native mandarin Chinese lexical tones","authors":"Ao Chen, Hintat Cheung, Yuchen Li, Liqun Gao","doi":"10.1109/O-COCOSDA46868.2019.9060851","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060851","url":null,"abstract":"The current study investigated native Mandarin Chinese children’s production of native lexical tones, in particular the low-rising tone (T2) and low-dipping tone (T3), which are acoustically most similar among all the Mandarin lexical tones. Using a picture naming task, ten 3-year-old children produced fourteen monosyllabic and disyllabic familiar words. Ten female adult listeners performed the same task as a control group. Acoustical measurements on pitch values and pitch alignment were conducted to analyze whether children made use of acoustical cues to distinguish T2 and T3 in an adult like way, and whether presence of tonal context in the disyllabic words influenced the acoustical implementation of T2 and T3. The results showed that, overall children exhibited adult-like pitch contour for T2 and T3, yet unlike adults who maintained the low feature of T3 for both pitch minimum and pitch maximum, children tended to increase the pitch maximum and consequently the pitch range to allow for implementation of the complex pitch contour of T3. Such increase is more evident for the disyllabic than for the monosyllabic words. These findings suggest that the presence of tonal context and tonal carry-over effect makes it more demanding for children to realize the complex pitch contour of T3, and they increase the pitch range to achieve such a goal.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"43 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132738049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characteristics of everyday conversation derived from the analysis of dialog act annotation 通过对对话行为注释的分析,得出日常会话的特征
Yuriko Iseki, Keisuke Kadota, Yasuharu Den
{"title":"Characteristics of everyday conversation derived from the analysis of dialog act annotation","authors":"Yuriko Iseki, Keisuke Kadota, Yasuharu Den","doi":"10.1109/O-COCOSDA46868.2019.9041235","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041235","url":null,"abstract":"This paper addresses an attempt to find out the characteristics of everyday conversation data through dialog act information. Although several earlier studies have discussed how to annotate DA information, few studies use the result of the annotation as a clue to derive the characteristics of conversation. We report on the work to annotate dialog act information on utterances in Japanese everyday conversation, and the possibility of extracting the interactional characteristics using the annotation. As a result of the analysis, it was found that the annotation reflects differences in behaviour depending on the type of conversation and participants' age. Also, even in conversations with similar settings, differences were found in the distribution of tags about interactional management. It is suggested that the annotation may also reflect information that is difficult to capture objectively such as the conversational atmosphere.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recent Progress of Mandrain Spontaneous Speech Recognition on Mandrain Conversation Dialogue Corpus 基于汉语会话对话语料库的汉语自发语音识别研究进展
Yu-Chih Deng, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang
{"title":"Recent Progress of Mandrain Spontaneous Speech Recognition on Mandrain Conversation Dialogue Corpus","authors":"Yu-Chih Deng, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang","doi":"10.1109/O-COCOSDA46868.2019.9041223","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041223","url":null,"abstract":"This paper presents a progress report on a relatively difficult ASR task on a spontaneous speech corpus - Mandarin Conversational Dialogue Corpus (MCDC). A DNN-based acoustic model is constructed based on the CLDNN structure with a large dataset that comprises two spontaneous-speech corpora and one read-speech corpus. The study uses a large text dataset formed by seven corpora to train an efficient general language model (LM). Two adapted LMs specially for spontaneous speech recognition are also constructed. Experimental results showed that the best performances of 26.3% in character error rate (CER) and 32.5% in word error rate (WER) were reached on MCDC. They represented 27.9% and 22.2% of relative CER and WER reductions as compared with the performances by the previous best HMM-based method. This confirms that the proposed method is promising in tackling on Mandarin spontaneous speech recognition.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121098961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信