2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献_第2页

Oriental COCOSDA – Country Report 2020 Language Resources Developed in Taiwan 东方COCOSDA - 2020年台湾语言资源开发国别报告

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/o-cocosda50338.2020.9311392

Sin-Horng Chen, Hsin-Min Wang

引用次数: 0

Using Multimodal Methods in L2 Intonation Teaching for Chinese EFL Learners 运用多模态方法进行第二语言语调教学

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295023

Chenyang Zhao, Ziyu Xiong, Ai-jun Li

{"title":"Using Multimodal Methods in L2 Intonation Teaching for Chinese EFL Learners","authors":"Chenyang Zhao, Ziyu Xiong, Ai-jun Li","doi":"10.1109/O-COCOSDA50338.2020.9295023","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295023","url":null,"abstract":"The present study aims to investigate how multimodal training method contribute to the improvement of the L2 intonation produced by Chinese EFL learners. Altogether 75 learners with an English major background from 3 different dialectal regions of China are recruited. They are divided into 5 groups which differ from each other in training methods, which specifically are the control group (G1), group with sound for training only (G2), group with sound and after-training feedback (G3), group with both audio and visual material for training (G4), and the audiovisual training group with feedback (G5). The results show that although no significant improvement between learners' pretest and posttest for each group, still we observe that some of the learners in experiment groups score significantly higher in posttest than those in the control group, and among them, G5 is the best as the most cases of intonation are improved through the training. This indicates that multimodal + supervised training method is the most effective way in L2 intonation teaching in this experiment. Unobvious improvement of in the rest cases might due to the limited training time, which will be further ameliorated by a supplementary intensive training in this method.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131098174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation 面向语音娱乐:考虑TTS波形生成中说话速率变化的ASR信息

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295020

Mayuko Okamato, S. Sakti, Satoshi Nakamura

{"title":"Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation","authors":"Mayuko Okamato, S. Sakti, Satoshi Nakamura","doi":"10.1109/O-COCOSDA50338.2020.9295020","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295020","url":null,"abstract":"State-of-the-art text-to-speech (TTS) systems successfully produce speech with a high degree of intelligibility. But TTS systems still often generate monotonous synthesized speech, unlike natural utterances. Several existing studies have addressed the issue of modeling speaking style variations in TTSs. Unfortunately, scant research has discussed the dialog and entrainment context. In this paper, we address TTS waveform generation toward speech entrainment in human-machine communication and focus on the synchronization of speaking rates that may vary within an utterance, i.e., slowing down to emphasize specific words and distinguish elements to highlight. We assume a dialog system exists and concentrate on its speech processing part. To perform such a task, we develop (1) a multi-task automatic speech recognition (ASR) that listens to the conversation partner and recognizes the content and the speaking rate and (2) a generative adversarial network (GAN)-based TTS that produces the synthesized speech of the response while entraining with the partner's speaking rate. The evaluation is performed on a dialog corpus. Our results reveal that it is possible to entrain the input speech by synchronizing the speaking rate.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128015249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TSynC-3miti: Audiovisual Speech Synthesis Database from Found Data TSynC-3miti:来自发现数据的视听语音合成数据库

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295001

A. Thangthai, Sumonmas Thatphithakkul, K. Thangthai, Arnon Namsanit

{"title":"TSynC-3miti: Audiovisual Speech Synthesis Database from Found Data","authors":"A. Thangthai, Sumonmas Thatphithakkul, K. Thangthai, Arnon Namsanit","doi":"10.1109/O-COCOSDA50338.2020.9295001","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295001","url":null,"abstract":"Building audiovisual speech synthesis database is a crucial factor in the applications of audiovisual speech synthesis systems. Typically, most databases captured on soundproof studio and hired a professional voice talent who speak clearly articulation and able to act and control their voice to read prepared scripts. However, the major drawbacks of conventional audiovisual speech databases are small, costly and time-consuming. Hence, this paper tackles these drawbacks and focuses on building a large audiovisual speech synthesis database using freely available noisy found data on the Web instead of recording clean data. This database, called TSynC-3miti, is the first Thai audiovisual speech synthesis database which are designed for audiovisual speech synthesis use, such as HMM/DNN-based Speech Synthesis System (HTS). Tons of video data have been collected from the ‘3mitinews’ channel on YouTube, which was broadcasted between 04-Jan-2017 and 23-Jan-2020. This paper introduces a procedure of data preparation from scratch, including face detection, text transcription, phoneme labels, and audiovisual data cleaning and feature extraction. The total video contains approximately 19 hours and also producing in audio, images, text transcriptions and phonetic labels.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122947951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Comparative Study of Named Entity Recognition on Myanmar Language 缅甸语命名实体识别的比较研究

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295004

Tin Latt Nandar, Thinn Lai Soe, K. Soe

引用次数: 1

Referential Nominal Metaphor Identification in Myanmar Language 缅甸语指称性名义隐喻的识别

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295042

Sheinn Thawtar Oo, A. Thida

{"title":"Referential Nominal Metaphor Identification in Myanmar Language","authors":"Sheinn Thawtar Oo, A. Thida","doi":"10.1109/O-COCOSDA50338.2020.9295042","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295042","url":null,"abstract":"Metaphor, one of the figurative usages, becomes a problem in natural language processing (NLP). Identification of metaphor becomes one of the attentive research works in NLP. Metaphor can be classified as nominal metaphor, verbal metaphor and adjective metaphor. In nominal metaphor, two types of nominal metaphor can be found: noun-noun metaphor and referential nominal metaphor. This paper presents the identification of referential nominal metaphor in Myanmar language. Identification of referential nominal metaphor requires to extract the two reference nouns and extracting rules are created for reference nouns extraction. The structure of rules for reference nouns extraction and how to extract the reference nouns using these extracting rules are explained detail in this paper. Myanmar WordNet, wordnet2sql and bilingual dictionary are used to identify the referential nominal metaphor. The experiments are done on the sentences of 6 genres, news, novels, articles, conversational, formal and overall sentences. The precision results for these 6 genres can be found 75% for news, 81% for novels, 83% for articles, 78% for conversational, 81% for formal and 81% for overall respectively. General discussion about the issues and explanations are also described in this paper.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131950696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Oriental-COCOSDA 2020 Japan Country Report 东方cocosda 2020日本国家报告

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295002

Satoshi Nakamura

引用次数: 0

Oriental COCOSDA 2020 Country Report: Recent Language Resources Development for Myanmar 东方COCOSDA 2020国家报告:缅甸最近的语言资源开发

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/o-cocosda50338.2020.9294998

K. Soe

引用次数: 0

Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech 乌尔都语-英语会话语码转换语音大词汇连续识别系统的改进

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295036

M. Farooq, F. Adeeba, S. Hussain, Sahar Rauf, Maryam Khalid

{"title":"Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech","authors":"M. Farooq, F. Adeeba, S. Hussain, Sahar Rauf, Maryam Khalid","doi":"10.1109/O-COCOSDA50338.2020.9295036","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295036","url":null,"abstract":"This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130137791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Mizo Spoken Query System Enhanced with Prosodic Information Mizo口语查询系统与韵律信息增强

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295007

Rupam Das, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, K. Samudravijaya, R. Sinha

{"title":"Mizo Spoken Query System Enhanced with Prosodic Information","authors":"Rupam Das, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, K. Samudravijaya, R. Sinha","doi":"10.1109/O-COCOSDA50338.2020.9295007","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295007","url":null,"abstract":"We report the development of a Mizo spoken language interface to an online database of commodity prices in various agricultural markets of Mizoram province. The spoken query system is designed to be used over mobile phone networks. It is inspired by earlier such systems developed for languages such as Assamese and Marathi. However, considering the manifold increase in the performance of the current Mizo language spoken query system, we were motivated to report the whole system implementation in detail. The average word error rate of the DNN-HMM speech recognition system in a 5-fold cross validation experiment is 2.2%. The word error rate of the Mizo commodity name recognition system is 2.75 times smaller than that reported for a similar system for Assamese language. The reduction in error rate is attributed to data collection from diverse environmental situations with different noise levels and further processing of the data. The word error rate of the Mizo ASR system increases to 8.5% in field trials conducted with several users.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116489685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0