2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献

筛选
英文 中文
Oriental COCOSDA – Country Report 2020 Language Resources Developed in Taiwan 东方COCOSDA - 2020年台湾语言资源开发国别报告
Sin-Horng Chen, Hsin-Min Wang
{"title":"Oriental COCOSDA – Country Report 2020 Language Resources Developed in Taiwan","authors":"Sin-Horng Chen, Hsin-Min Wang","doi":"10.1109/o-cocosda50338.2020.9311392","DOIUrl":"https://doi.org/10.1109/o-cocosda50338.2020.9311392","url":null,"abstract":"Academia Sinica Audio-Visual Speech Corpus","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126445856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Multimodal Methods in L2 Intonation Teaching for Chinese EFL Learners 运用多模态方法进行第二语言语调教学
Chenyang Zhao, Ziyu Xiong, Ai-jun Li
{"title":"Using Multimodal Methods in L2 Intonation Teaching for Chinese EFL Learners","authors":"Chenyang Zhao, Ziyu Xiong, Ai-jun Li","doi":"10.1109/O-COCOSDA50338.2020.9295023","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295023","url":null,"abstract":"The present study aims to investigate how multimodal training method contribute to the improvement of the L2 intonation produced by Chinese EFL learners. Altogether 75 learners with an English major background from 3 different dialectal regions of China are recruited. They are divided into 5 groups which differ from each other in training methods, which specifically are the control group (G1), group with sound for training only (G2), group with sound and after-training feedback (G3), group with both audio and visual material for training (G4), and the audiovisual training group with feedback (G5). The results show that although no significant improvement between learners' pretest and posttest for each group, still we observe that some of the learners in experiment groups score significantly higher in posttest than those in the control group, and among them, G5 is the best as the most cases of intonation are improved through the training. This indicates that multimodal + supervised training method is the most effective way in L2 intonation teaching in this experiment. Unobvious improvement of in the rest cases might due to the limited training time, which will be further ameliorated by a supplementary intensive training in this method.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131098174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation 面向语音娱乐:考虑TTS波形生成中说话速率变化的ASR信息
Mayuko Okamato, S. Sakti, Satoshi Nakamura
{"title":"Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation","authors":"Mayuko Okamato, S. Sakti, Satoshi Nakamura","doi":"10.1109/O-COCOSDA50338.2020.9295020","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295020","url":null,"abstract":"State-of-the-art text-to-speech (TTS) systems successfully produce speech with a high degree of intelligibility. But TTS systems still often generate monotonous synthesized speech, unlike natural utterances. Several existing studies have addressed the issue of modeling speaking style variations in TTSs. Unfortunately, scant research has discussed the dialog and entrainment context. In this paper, we address TTS waveform generation toward speech entrainment in human-machine communication and focus on the synchronization of speaking rates that may vary within an utterance, i.e., slowing down to emphasize specific words and distinguish elements to highlight. We assume a dialog system exists and concentrate on its speech processing part. To perform such a task, we develop (1) a multi-task automatic speech recognition (ASR) that listens to the conversation partner and recognizes the content and the speaking rate and (2) a generative adversarial network (GAN)-based TTS that produces the synthesized speech of the response while entraining with the partner's speaking rate. The evaluation is performed on a dialog corpus. Our results reveal that it is possible to entrain the input speech by synchronizing the speaking rate.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128015249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSynC-3miti: Audiovisual Speech Synthesis Database from Found Data TSynC-3miti:来自发现数据的视听语音合成数据库
A. Thangthai, Sumonmas Thatphithakkul, K. Thangthai, Arnon Namsanit
{"title":"TSynC-3miti: Audiovisual Speech Synthesis Database from Found Data","authors":"A. Thangthai, Sumonmas Thatphithakkul, K. Thangthai, Arnon Namsanit","doi":"10.1109/O-COCOSDA50338.2020.9295001","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295001","url":null,"abstract":"Building audiovisual speech synthesis database is a crucial factor in the applications of audiovisual speech synthesis systems. Typically, most databases captured on soundproof studio and hired a professional voice talent who speak clearly articulation and able to act and control their voice to read prepared scripts. However, the major drawbacks of conventional audiovisual speech databases are small, costly and time-consuming. Hence, this paper tackles these drawbacks and focuses on building a large audiovisual speech synthesis database using freely available noisy found data on the Web instead of recording clean data. This database, called TSynC-3miti, is the first Thai audiovisual speech synthesis database which are designed for audiovisual speech synthesis use, such as HMM/DNN-based Speech Synthesis System (HTS). Tons of video data have been collected from the ‘3mitinews’ channel on YouTube, which was broadcasted between 04-Jan-2017 and 23-Jan-2020. This paper introduces a procedure of data preparation from scratch, including face detection, text transcription, phoneme labels, and audiovisual data cleaning and feature extraction. The total video contains approximately 19 hours and also producing in audio, images, text transcriptions and phonetic labels.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122947951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Comparative Study of Named Entity Recognition on Myanmar Language 缅甸语命名实体识别的比较研究
Tin Latt Nandar, Thinn Lai Soe, K. Soe
{"title":"A Comparative Study of Named Entity Recognition on Myanmar Language","authors":"Tin Latt Nandar, Thinn Lai Soe, K. Soe","doi":"10.1109/O-COCOSDA50338.2020.9295004","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295004","url":null,"abstract":"This paper represents the development of the Myanmar Named Entity Recognition (NER) system using Conditional Random Fields (CRFs). In order to develop the system, a manually annotated Named Entities (NEs) corpus - collected from Myanmar news websites and Asia Language Treebank(ALT)-Parallel-Corpus has been used. We compare the performance of the system getting syllable-based input to the one getting character-based input. We observed that training data has more impact on the performance of the system. The experimental results show that the syllable-based system performs better than the character-based system. It achieves that Precision, Recall and F1-score values of 93.62%, 91.64% and 92.62% respectively.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115312729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Referential Nominal Metaphor Identification in Myanmar Language 缅甸语指称性名义隐喻的识别
Sheinn Thawtar Oo, A. Thida
{"title":"Referential Nominal Metaphor Identification in Myanmar Language","authors":"Sheinn Thawtar Oo, A. Thida","doi":"10.1109/O-COCOSDA50338.2020.9295042","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295042","url":null,"abstract":"Metaphor, one of the figurative usages, becomes a problem in natural language processing (NLP). Identification of metaphor becomes one of the attentive research works in NLP. Metaphor can be classified as nominal metaphor, verbal metaphor and adjective metaphor. In nominal metaphor, two types of nominal metaphor can be found: noun-noun metaphor and referential nominal metaphor. This paper presents the identification of referential nominal metaphor in Myanmar language. Identification of referential nominal metaphor requires to extract the two reference nouns and extracting rules are created for reference nouns extraction. The structure of rules for reference nouns extraction and how to extract the reference nouns using these extracting rules are explained detail in this paper. Myanmar WordNet, wordnet2sql and bilingual dictionary are used to identify the referential nominal metaphor. The experiments are done on the sentences of 6 genres, news, novels, articles, conversational, formal and overall sentences. The precision results for these 6 genres can be found 75% for news, 81% for novels, 83% for articles, 78% for conversational, 81% for formal and 81% for overall respectively. General discussion about the issues and explanations are also described in this paper.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131950696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oriental-COCOSDA 2020 Japan Country Report 东方cocosda 2020日本国家报告
Satoshi Nakamura
{"title":"Oriental-COCOSDA 2020 Japan Country Report","authors":"Satoshi Nakamura","doi":"10.1109/O-COCOSDA50338.2020.9295002","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295002","url":null,"abstract":"This article consists only of a collection of slides from the author's conference presentation.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122431024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oriental COCOSDA 2020 Country Report: Recent Language Resources Development for Myanmar 东方COCOSDA 2020国家报告:缅甸最近的语言资源开发
K. Soe
{"title":"Oriental COCOSDA 2020 Country Report: Recent Language Resources Development for Myanmar","authors":"K. Soe","doi":"10.1109/o-cocosda50338.2020.9294998","DOIUrl":"https://doi.org/10.1109/o-cocosda50338.2020.9294998","url":null,"abstract":"This article consists only of a collection of slides from the author's conference presentation.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122673775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech 乌尔都语-英语会话语码转换语音大词汇连续识别系统的改进
M. Farooq, F. Adeeba, S. Hussain, Sahar Rauf, Maryam Khalid
{"title":"Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech","authors":"M. Farooq, F. Adeeba, S. Hussain, Sahar Rauf, Maryam Khalid","doi":"10.1109/O-COCOSDA50338.2020.9295036","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295036","url":null,"abstract":"This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130137791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mizo Spoken Query System Enhanced with Prosodic Information Mizo口语查询系统与韵律信息增强
Rupam Das, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, K. Samudravijaya, R. Sinha
{"title":"Mizo Spoken Query System Enhanced with Prosodic Information","authors":"Rupam Das, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, K. Samudravijaya, R. Sinha","doi":"10.1109/O-COCOSDA50338.2020.9295007","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295007","url":null,"abstract":"We report the development of a Mizo spoken language interface to an online database of commodity prices in various agricultural markets of Mizoram province. The spoken query system is designed to be used over mobile phone networks. It is inspired by earlier such systems developed for languages such as Assamese and Marathi. However, considering the manifold increase in the performance of the current Mizo language spoken query system, we were motivated to report the whole system implementation in detail. The average word error rate of the DNN-HMM speech recognition system in a 5-fold cross validation experiment is 2.2%. The word error rate of the Mizo commodity name recognition system is 2.75 times smaller than that reported for a similar system for Assamese language. The reduction in error rate is attributed to data collection from diverse environmental situations with different noise levels and further processing of the data. The word error rate of the Mizo ASR system increases to 8.5% in field trials conducted with several users.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116489685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信