2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献_第2页

Indian Languages Corpus for Speech Recognition 印度语言语料库语音识别

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041171

Joyanta Basu, Soma Khan, Rajib Roy, Babita Saxena, Dipankar Ganguly, Sunita Arora, K. Arora, S. Bansal, S. Agrawal

{"title":"Indian Languages Corpus for Speech Recognition","authors":"Joyanta Basu, Soma Khan, Rajib Roy, Babita Saxena, Dipankar Ganguly, Sunita Arora, K. Arora, S. Bansal, S. Agrawal","doi":"10.1109/O-COCOSDA46868.2019.9041171","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041171","url":null,"abstract":"Robust Speech Recognition System for various languages have transcended beyond research labs to commercial products. It has been possible owing to the major developments in the area of machine learning, especially deep learning. However, development of advanced speech recognition systems could be leveraged only with the availability of specially curetted speech data. Such systems having usable quality are yet to be developed for most of the Indian languages. The present paper describes the design and development of a standard speech corpora which can be used for developing general purpose ASR systems and benchmarking them. This database has been developed for Indian languages namely Hindi, Bengali and Indian English. The corpus design incorporates important parameters such as phonetic coverage and distribution. The data was recorded by 1500 speakers in each language by male and female speakers of different age groups in varying environments. The data was recorded on a server using online recording system and transcribed using semi-automatic tools. The paper describes the corpus designing methodology, challenges faced and approach adopted to overcome them. The whole process of designing speech database has been generic enough to be used for other languages as well.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Acquisition of english retroflex vowel [3] by EFL learners from Chinese dialectal regions- A case study of Beijing and Changsha 汉语方言地区英语学习者英语反折元音习得[3]——以北京和长沙为例

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9060843

Bin Li, Yuan Jia

{"title":"Acquisition of english retroflex vowel [3] by EFL learners from Chinese dialectal regions- A case study of Beijing and Changsha","authors":"Bin Li, Yuan Jia","doi":"10.1109/O-COCOSDA46868.2019.9060843","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060843","url":null,"abstract":"This paper investigates, through an extensive acoustic analysis, the acquisition of English retroflex vowel [3] by learners of English as Foreign Language (EFL) from Beijing (BJ) and Changsha (CS), which are representative dialectal regions in the north and south China respectively. In our analysis, formant and duration were selected as parameters. The results demonstrate that all the EFL learners involved in the study produced the onset target and offset target of [3] with a more backward tendency. For formant patterns, both native speakers and EFL learners present a similar tendency, namely the decline of F3 and the rise of F2. For CS speakers, due to the effect of their mother tongue, their F3 falls more slowly. Moreover, from the spectral perspective, the F3 changing rate of CS male learners is significantly smaller than that of native speakers. On the other hand, BJ learners, especially female learners, show more obvious changes in F3 than native speakers. In addition, we speculate that the language background and gender can affect the acquisition of retroflex vowels.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phoneme-level speaking rate variation on waveform generation using GAN-TTS 基于GAN-TTS的音素级说话速率变化波形生成

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9060845

Mayuko Okamato, S. Sakti, Satoshi Nakamura

引用次数: 2

A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion 用音节语调预测泰文字素到音素转换中WER的大幅减少

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041212

S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai

{"title":"A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion","authors":"S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai","doi":"10.1109/O-COCOSDA46868.2019.9041212","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041212","url":null,"abstract":"Thai toneme prediction has been one of the greatest difficulties for Thai grapheme to phoneme conversion (G2P). This paper presents an improvement in the prediction of linguistic features in terms of tone rules. Among these, there will always be exceptions, for example, the tones used in loan words and transliterated words, which are usually adopted from the original language. This paper does not concern itself with the transliteration problem, but aims to show the success of a method which uses an automatic toneme predictor based on the tone rules of Thai pronunciation for the development of a machine learning model. The proposed method attaches a predictor to the final stage of converting a grapheme to a phoneme. Furthermore, this work also explores end-to-end prediction using Long Short Term Memories (LSTM) that takes its input sequence from the National Electronic and Computer Technology Center's Pseudo-Syllable segmentation and alignment tool. An evaluation was conducted to show the success of the proposed system, and also to compare the results with our traditional end-to-end sequence-to-sequence G2P. A comparison of the results shows that sequence-to-sequence modeling obtains the lowest Word Error Rate at 1.6%, and the proposed system works well on a 2018 small device (Raspberry Pi).","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary 大型缅甸语语音词典中字素到音素转换的序列到序列模型

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041225

Aye Mya Hlaing, Win Pa Pa

引用次数: 2

index 指数

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/o-cocosda46868.2019.9041241

引用次数: 0

Annotation and preliminary analysis of utterance decontextualization in a multiactivity 多活动中话语去语境化的注释与初步分析

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041203

Haruka Amatani, Yayoi Tanaka

引用次数: 0

Three-year-old children's production of native mandarin Chinese lexical tones 三岁儿童产生的地道汉语词汇声调

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9060851

Ao Chen, Hintat Cheung, Yuchen Li, Liqun Gao

{"title":"Three-year-old children's production of native mandarin Chinese lexical tones","authors":"Ao Chen, Hintat Cheung, Yuchen Li, Liqun Gao","doi":"10.1109/O-COCOSDA46868.2019.9060851","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060851","url":null,"abstract":"The current study investigated native Mandarin Chinese children’s production of native lexical tones, in particular the low-rising tone (T2) and low-dipping tone (T3), which are acoustically most similar among all the Mandarin lexical tones. Using a picture naming task, ten 3-year-old children produced fourteen monosyllabic and disyllabic familiar words. Ten female adult listeners performed the same task as a control group. Acoustical measurements on pitch values and pitch alignment were conducted to analyze whether children made use of acoustical cues to distinguish T2 and T3 in an adult like way, and whether presence of tonal context in the disyllabic words influenced the acoustical implementation of T2 and T3. The results showed that, overall children exhibited adult-like pitch contour for T2 and T3, yet unlike adults who maintained the low feature of T3 for both pitch minimum and pitch maximum, children tended to increase the pitch maximum and consequently the pitch range to allow for implementation of the complex pitch contour of T3. Such increase is more evident for the disyllabic than for the monosyllabic words. These findings suggest that the presence of tonal context and tonal carry-over effect makes it more demanding for children to realize the complex pitch contour of T3, and they increase the pitch range to achieve such a goal.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"43 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132738049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characteristics of everyday conversation derived from the analysis of dialog act annotation 通过对对话行为注释的分析，得出日常会话的特征

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041235

Yuriko Iseki, Keisuke Kadota, Yasuharu Den

引用次数: 3

Recent Progress of Mandrain Spontaneous Speech Recognition on Mandrain Conversation Dialogue Corpus 基于汉语会话对话语料库的汉语自发语音识别研究进展

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI: 10.1109/O-COCOSDA46868.2019.9041223

Yu-Chih Deng, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang

引用次数: 1