2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献

筛选
英文 中文
Message of the Organizers 主办单位寄语
Akihiro Fujiwara, H. Irie, Y. Kakuda, Hiroyuki Sato
{"title":"Message of the Organizers","authors":"Akihiro Fujiwara, H. Irie, Y. Kakuda, Hiroyuki Sato","doi":"10.1109/o-cocosda46868.2019.9050341","DOIUrl":"https://doi.org/10.1109/o-cocosda46868.2019.9050341","url":null,"abstract":"We are delighted to welcome you all to Cebu for the 22nd Conference of the Oriental COCOSDA (Oriental Chapter of the International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques). This conference is technically co-sponsored by the IEEE Philippine section and organized by the Computing Society of the Philippines – Special Interest Group on Natural Language Processing, National University, and the University of San Carlos.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121195317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indic TIMIT and Indic English lexicon: A speech database of Indian speakers using TIMIT stimuli and a lexicon from their mispronunciations 印度语TIMIT和印度语英语词汇:一个使用TIMIT刺激的印度人的语音数据库和一个来自他们发音错误的词汇
Chiranjeevi Yarra, Ritu Aggarwal, Avni Rajpal, P. Ghosh
{"title":"Indic TIMIT and Indic English lexicon: A speech database of Indian speakers using TIMIT stimuli and a lexicon from their mispronunciations","authors":"Chiranjeevi Yarra, Ritu Aggarwal, Avni Rajpal, P. Ghosh","doi":"10.1109/O-COCOSDA46868.2019.9041230","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041230","url":null,"abstract":"With the advancements in the speech technology, demand for larger speech corpora is increasing particularly those from non-native English speakers. In order to cater to this demand under Indian context, we acquire a database named Indic TIMIT, a phonetically rich Indian English speech corpus. It contains ~240 hours of speech recordings from 80 subjects, in which, each subject has spoken a set of 2342 stimuli available in the TIMIT corpus. Further, the corpus also contains phoneme transcriptions for a sub-set of recordings, which are manually annotated by two linguists reflecting speaker's pronunciation. Considering these, Indic TIMIT is unique with respect to the existing corpora that are available in Indian context. Along with Indic TIMIT, a lexicon named Indic English lexicon is provided, which is constructed by incorporating pronunciation variations specific to Indians obtained from their errors to the existing word pronunciations in a native English lexicon. In this paper, the effectiveness of Indic TIMIT and Indic English lexicon is shown respectively in comparison with the data from TIMIT and a lexicon augmented with all the word pronunciations from CMU, Beep and the lexicon available in the TIMIT corpus. Indic TIMIT and Indic English lexicon could be useful for a number of potential applications in Indian context including automatic speech recognition, mispronunciation detection & diagnosis, native language identification, accent adaptation, accent conversion, voice conversion, speech synthesis, grapheme-to-phoneme conversion, automatic phoneme unit discovery and pronunciation error analysis.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123219966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Distributing and Sharing Resources for Automatic Speech Recognition Applications 自动语音识别应用程序的资源分配和共享
Sila Chunwijitra, Surasak Boonkla, Vataya Chunwijitra, Nattapong Kurpukdee, P. Sertsi, S. Kasuriya
{"title":"Distributing and Sharing Resources for Automatic Speech Recognition Applications","authors":"Sila Chunwijitra, Surasak Boonkla, Vataya Chunwijitra, Nattapong Kurpukdee, P. Sertsi, S. Kasuriya","doi":"10.1109/O-COCOSDA46868.2019.9041201","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041201","url":null,"abstract":"Implementation of automatic speech recognition (ASR) system to the real scenarios has been discovered many difficulties in two main topics: processing time and resource demands. These obstructions are such big issues in deploying ASR system. This paper proposed three approaches to deal with those problems, which are applying multithread processing to separate sub-processes, exploiting multiplexing and demultiplexing technique to network socket, and improving the distribution of speech recognition engine in audio streaming. In the experiment, we evaluated our approaches with two types of speech input (audio files and audio streams). The results showed that our approaches are using fewer resources (sharing working memory) and also reduce the processing time since the real-time factor (RTF) is reduced by 15 % approximately comparing with the baseline system.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123459363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Motion detection of articulatory movement with paralinguistic information using real-time MRI movie 基于实时MRI影像的辅助语言信息的发音运动检测
Takuya Asai, H. Kikuchi, K. Maekawa
{"title":"Motion detection of articulatory movement with paralinguistic information using real-time MRI movie","authors":"Takuya Asai, H. Kikuchi, K. Maekawa","doi":"10.1109/O-COCOSDA46868.2019.9060850","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060850","url":null,"abstract":"The goal of this study is to establish the analytical method of articulatory movement by real-time magnetic resonance images (rtMRI) without estimation of articulatory contours. We present the result of motion detection using a background subtraction method. As a result of applying the background subtraction method to the rtMRI data of one speaker, some motions were detected in tongue, lip, and lower jaw which are important places for speech generation. By the experiments with the movies of multiple speakers, we confirmed that it is possible to detect the motion of the basic articulatory movement, and some movements were different for each speaker. Furthermore, we adapted the proposed method for motion detection to the utterances aiming at the transmission of paralinguistic information. As a result, some similar movements to the previous research were observed.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127545550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognition and translation of code-switching speech utterances 语码转换语音的识别与翻译
Sahoko Nakayama, Takatomo Kano, Andros Tjandra, S. Sakti, Satoshi Nakamura
{"title":"Recognition and translation of code-switching speech utterances","authors":"Sahoko Nakayama, Takatomo Kano, Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/O-COCOSDA46868.2019.9060847","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060847","url":null,"abstract":"Code-switching (CS), a hallmark of worldwide bilingual communities, refers to a strategy adopted by bilinguals (or multilinguals) who mix two or more languages in a discourse often with little change of interlocutor or topic. The units and the locations of the switches may vary widely from single-word switches to whole phrases (beyond the length of the loanword units). Such phenomena pose challenges for spoken language technologies, i.e., automatic speech recognition (ASR), since the systems need to be able to handle the input in a multilingual setting. Several works constructed a CS ASR on many different language pairs. But the common aim of developing a CS ASR is merely for transcribing CS-speech utterances into CS-text sentences within a single individual. In contrast, in this study, we address the situational context that happens during dialogs between CS and non-CS (monolingual) speakers and support monolingual speakers who want to understand CS speakers. We construct a system that recognizes and translates from codeswitching speech to monolingual text. We investigated several approaches, including a cascade of ASR and a neural machine translation (NMT), a cascade of ASR and a deep bidirectional language model (BERT), an ASR that directly outputs monolingual transcriptions from CS speech, and multi-task learning. Finally, we evaluate and discuss these four ways on a Japanese- English CS to English monolingual task.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126856618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
O-COCOSDA 2019 Thailand report October 2019 O-COCOSDA 2019泰国报告2019年10月
{"title":"O-COCOSDA 2019 Thailand report October 2019","authors":"","doi":"10.1109/o-cocosda46868.2019.9060839","DOIUrl":"https://doi.org/10.1109/o-cocosda46868.2019.9060839","url":null,"abstract":"","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOTUS-BI: A Thai-English Code-mixing Speech Corpus LOTUS-BI:泰英码混合语音语料库
Sumonmas Thatphithakkul, Vataya Chunwijitra, P. Sertsi, P. Chootrakool, S. Kasuriya
{"title":"LOTUS-BI: A Thai-English Code-mixing Speech Corpus","authors":"Sumonmas Thatphithakkul, Vataya Chunwijitra, P. Sertsi, P. Chootrakool, S. Kasuriya","doi":"10.1109/O-COCOSDA46868.2019.9041195","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041195","url":null,"abstract":"Nowadays, English words mixed in Thai speech are usually found in a typical speaking style. Consequently, to increase the performance of the speech recognition system, a Thai-English code-mixing speech corpus is required. This paper describes the design and construction of LOTUS-BI corpus: a Thai-English code-mixing speech corpus aimed to be the essential speech database for training acoustic model and language model in order to obtain the better speech recognition accuracy. LOTUS-BI corpus contains 16.5 speech hours from 4 speech tasks: interview, talk, seminar, and meeting. Now, 11.5 speech hours of data from the interview, talk, and seminar acquire from the internet have been transcribed and annotated. Whereas, the rest of 5 speech hours from meeting task has been transcribing. Therefore, only 11.5 speech hours of data were analyzed in this paper. Furthermore, the pronunciation dictionary of vocabularies from LOTUS-BI corpus is created based on Thai phoneme set. The statistical analysis of LOTUS-BI corpus revealed that there are 37.96% of code-mixing utterances, including 34.23% intra-sentential and 3.73% inter-sentential utterances. The occurrence of English vocabularies is 29.04% of the total vocabularies in the corpus. Besides, nouns are found in 90% of all English vocabularies in the corpus and 10% in the other grammatical categories.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A linguistic representation scheme for depression prediction - with a case study 抑郁症预测的语言表示方案——附案例研究
Yuan Jia, Yuzhu Liang, T. Zhu
{"title":"A linguistic representation scheme for depression prediction - with a case study","authors":"Yuan Jia, Yuzhu Liang, T. Zhu","doi":"10.1109/O-COCOSDA46868.2019.9060849","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060849","url":null,"abstract":"In this paper, we propose a representation scheme for modeling linguistic and paralinguistic features (emotion and speech act features) of depression patients, based on which a diagnostic model is constructed. The model can be used to assist the identification of depression and predict the degree of depression. A case study with the micro-blog data from a real depression patient and three non-patients, is carried out to illustrate the discriminative power of the linguistic and paralinguistic features. The results demonstrate the ability of the proposed representation scheme to not only distinguish the patient from non-patients but also distinguish different stages of the patient.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
X-vectors based Urdu Speaker Identification for short utterances 基于x向量的乌尔都语短话语识别
M. Farooq, F. Adeeba, S. Hussain
{"title":"X-vectors based Urdu Speaker Identification for short utterances","authors":"M. Farooq, F. Adeeba, S. Hussain","doi":"10.1109/O-COCOSDA46868.2019.9041237","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041237","url":null,"abstract":"In context of commercial applications, robustness of a Speaker Identification (SI) system is adversely effected by short utterances. Performance of SI systems fairly depends upon extracted feature sets. This paper investigates the effect of various feature extraction techniques on performance of i-vectors and x-vectors based Urdu speakers' identification models. The scope of this paper is restricted to text independent speaker identification for short utterances (up to 4 seconds). SI systems demand for a large data covering sufficient inter-speaker and intra-speaker variability. Available Urdu speech corpus is used to measure performance of various feature sets on SI systems. A minimum percentage Equal Error Rate (%EER) of 0.113 is achieved using x-vectors with Linear Frequency Cepstral Coefficients (LFCCs) feature set.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122086893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition 一种新的老年日语语音语料库声学建模及方言语音识别的初步研究
Meiko Fukuda, Ryota Nishimura, H. Nishizaki, Y. Iribe, N. Kitaoka
{"title":"A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition","authors":"Meiko Fukuda, Ryota Nishimura, H. Nishizaki, Y. Iribe, N. Kitaoka","doi":"10.1109/O-COCOSDA46868.2019.9041216","DOIUrl":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041216","url":null,"abstract":"We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124303078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信