2018 Oriental COCOSDA - International Conference on Speech Database and Assessments最新文献

筛选
英文 中文
Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data 多模态多任务深度学习与电视连续剧数据的情感识别
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-07 DOI: 10.1109/ICSDA.2018.8693020
Sashi Novitasari, Quoc Truong Do, S. Sakti, D. Lestari, Satoshi Nakamura
{"title":"Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data","authors":"Sashi Novitasari, Quoc Truong Do, S. Sakti, D. Lestari, Satoshi Nakamura","doi":"10.1109/ICSDA.2018.8693020","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693020","url":null,"abstract":"Since paralinguistic aspects must be considered to understand speech, we construct a deep learning framework that utilizes multi-modal features to simultaneously recognize both speakers and emotions. There are three kinds of feature modalities: acoustic, lexical, and facial. To fuse the features from multiple modalities, we experimented on three methods: majority voting, concatenation, and hierarchical fusion. The recognition was done from TV-series dataset that simulate actual conversations.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Japanese-English Code-Switching Speech Data Construction 日英码转换语音数据构建
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-07 DOI: 10.1109/ICSDA.2018.8693044
Sahoko Nakayama, Takatomo Kano, Quoc Truong Do, S. Sakti, Satoshi Nakamura
{"title":"Japanese-English Code-Switching Speech Data Construction","authors":"Sahoko Nakayama, Takatomo Kano, Quoc Truong Do, S. Sakti, Satoshi Nakamura","doi":"10.1109/ICSDA.2018.8693044","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693044","url":null,"abstract":"As the number of Japanese-English bilingual speakers continues to increase, code-switching phenomena also happen more frequently. The units and locations of switches may vary widely from single word switches to whole phrases (beyond the length of the loanword units). Therefore, speech recognition systems must be developed that can handle not only Japanese or English but also Japanese-English code-switching. Consequently, a large-scale code-switching speech database is required for model training. But collecting natural conversation dialogues of Japanese-English data is both time-consuming and expensive. This paper presents the construction of Japanese-English code-switching speech data by utilizing a Japanese and English text-to-speech system from a bilingual speaker. Various switching units are also investigated including units of words and phrases. As a result, we successfully constructed over 280-k speech utterances of Japanese-English code-switching.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132897888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Speech Corpora of Under Resourced Languages of North-East India 印度东北部资源不足语言语音语料库
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-07 DOI: 10.1109/ICSDA.2018.8693038
Barsha Deka, Joyshree Chakraborty, Abhishek Dey, Shikhamoni Nath, Priyankoo Sarmah, S. Nirmala, K. Samudravijaya
{"title":"Speech Corpora of Under Resourced Languages of North-East India","authors":"Barsha Deka, Joyshree Chakraborty, Abhishek Dey, Shikhamoni Nath, Priyankoo Sarmah, S. Nirmala, K. Samudravijaya","doi":"10.1109/ICSDA.2018.8693038","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693038","url":null,"abstract":"In this paper, we present an account of an ongoing effort in creation of speech corpora of under-resourced languages of North-East India, namely, Assamese, Bengali and Nepali. The speech corpora are being created for development of Automatic Speech Recognition system in Assamese as well as for Language Identification system. The text corpus of Assamese language comprises of 1000 sentences collected from different sources such as story books, novels, proverbs. Speech data are recorded over telephone channel using an interactive voice response system. Speakers were asked to read one or more sets of sentences, each set containing 20 sentences. Speech was simultaneously recorded using a hand-held audio recorder. While significant amount of speech data has been collected for Assamese language, the task has begun for Bengali, Nepali and English spoken by native speakers of these 3 languages. Currently, the Assamese speech database contains more than 5000 utterances by 27 native speakers. Information about the speakers such as dialect, gender, age-group were also collected. We discuss the methodology used in collecting speech samples, and present a descriptive statistics of the speech corpora.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131116529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Acoustic Comparison of Vowel Articulation When Combined with Different Tone Categories in Mandarin 普通话不同声调类别下元音发音的声学比较
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693015
Chong Cao, Yanlu Xie, Jinsong Zhang
{"title":"Acoustic Comparison of Vowel Articulation When Combined with Different Tone Categories in Mandarin","authors":"Chong Cao, Yanlu Xie, Jinsong Zhang","doi":"10.1109/ICSDA.2018.8693015","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693015","url":null,"abstract":"It was found that there existed an interaction between the source (i.e., fundamental frequencies) and the vocal tract filter (i.e., formant frequencies). Previous studies investigated such interaction from a perspective of perception with evidence from Mandarin which uses four tones to distinguish lexical meanings. While few studies examined such interaction from a perspective of production. This study explored differences of formant frequencies in vowel articulation when combined with different fundamental frequency patterns (i.e., tones). We calculated frequencies of the first two formants (i.e., F1, F2) and their distance (i.e., F2-F1) of different vowels with four lexical tones. Results showed that both F1 and F2 values were significantly different when combined with different tones. Moreover, such interaction varied with vowels: high vowels usually presented a contrary correlation pattern compared with other vowels. The finding about the co-variation between formants and fundamental frequencies may help to improve the naturalness of speech synthesis.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129507882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AWA Long-Term Recorded Speech Corpus And Robust Speaker Recognition Method For Session Variability 基于会话变异性的AWA长期录音语料库和鲁棒说话人识别方法
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693004
S. Tsuge, S. Kuroiwa, Tomoko Ohsuga, Y. Ishimoto
{"title":"AWA Long-Term Recorded Speech Corpus And Robust Speaker Recognition Method For Session Variability","authors":"S. Tsuge, S. Kuroiwa, Tomoko Ohsuga, Y. Ishimoto","doi":"10.1109/ICSDA.2018.8693004","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693004","url":null,"abstract":"Session variability is one of the most important issues in the speaker recognition technology. On the other hand, our scientific interest lies in how individual voice changes as time progresses and where the limit of the changes. From these motivations, we have been constructing “AWA Long-Term Recorded speech corpus (AWA-LTR)” that contains one's same content speech recorded at morning, noon, and evening once a week for over 10 years using the same microphone in a soundproof chamber. AWA-LTR first version has been released by Speech Resources Consortium, National Institute of Informatics (NII-SRC), Japan in 2012. In addition, we will release AWA-LTR second version in 2018. Hence, in this paper, we describe the details of AWA-LTR and the data release schedule of this corpus. As an effective application example using the corpus, we propose a robust speaker recognition method for session variability and evaluate the proposed method by the speaker identification experiment in this paper.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115206424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URDU Speech Corpora for Banking Sector in Pakistan 巴基斯坦银行业乌尔都语语料库
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693010
B. Mumtaz, Sahar Rauf, Hafsa Qadir, J. Khalid, T. Habib, S. Hussain, Rukhsana Barkat, E. Haq
{"title":"URDU Speech Corpora for Banking Sector in Pakistan","authors":"B. Mumtaz, Sahar Rauf, Hafsa Qadir, J. Khalid, T. Habib, S. Hussain, Rukhsana Barkat, E. Haq","doi":"10.1109/ICSDA.2018.8693010","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693010","url":null,"abstract":"This research describes an effort to build Urdu speech corpora for the banking sector in Pakistan. We have designed speech corpora to develop debit card activation ASR and these corpora are comprised of eight types of corpora mainly debit card number corpus, expiry date corpus, last four digit corpus, months' name, date of birth corpus, account type and Urdu-counting corpus. These corpora contain telephone speech in read style obtained from more than 400 speakers specifically in Punjabi accent in both outdoor and indoor environments, including offices, homes, banks, and universities. The speech is automatically annotated and manually verified at sentence tier and reports 98% inter-annotator accuracy. In this paper, we report the design, recording and annotation process of speech corpora that serve as a data development step for ASR, and will be integrated in debit card activation service in banking sector of Pakistan.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133829483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Development of Text and Speech Corpus for Designing the Multilingual Recognition System 多语言识别系统中文本和语音语料库的开发
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693013
S. Bansal, S. Agrawal
{"title":"Development of Text and Speech Corpus for Designing the Multilingual Recognition System","authors":"S. Bansal, S. Agrawal","doi":"10.1109/ICSDA.2018.8693013","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693013","url":null,"abstract":"To create the multilingual speech and text corpus manually is very difficult and time-consuming task. This paper presents the overall methodology and experiences of text and speech data collection for three under resourced languages i.e., Hindi, Manipuri and Urdu. The text data collection is done through web crawling in 3 domains i.e., general, news and travel to capture the versatility of database among these languages. The main objective of this project is to collect text and speech database which can be used for training the multilingual spoken language identification systems. In total we collected a text corpus of three million words and audio corpus of 150 speakers (50 native speakers) of each language. Each speaker recorded 300 phonetically rich sentences created through text analysis. The speech utterances were recorded at the rate of 16 kHz through microphone using GOLDWAVE software tool in a sound treated room. The collected speech data sets were annotated manually at phonemic level for each language and made available for development of multilingual recognition system.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129370501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unsupervised Dependency Corpus Annotation for Myanmar Language 缅甸语无监督依存语料库标注
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693009
Hnin Thu Zar Aye, Win Pa Pa, Ye Kyaw Thu
{"title":"Unsupervised Dependency Corpus Annotation for Myanmar Language","authors":"Hnin Thu Zar Aye, Win Pa Pa, Ye Kyaw Thu","doi":"10.1109/ICSDA.2018.8693009","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693009","url":null,"abstract":"Dependency parsing can provide the connection of linguistic unit (words) by a directed links. This paper presents annotating a general domain corpus by using unsupervised approach by applying Universal part-of-speech (U-POS) to build Treebank for unsupervised dependency parsing of Myanmar Language. Up to now it is still hard task to obtain complete syntactic structures for Myanmar Language. Dependency structures of words in Myanmar sentences are also presented of general words and phrases orders and the relations of basic sentence structures. To annotate by using U-POS, UDPipe is used. Moreover, the preliminary results of annotated trees and parsing experiment are presented. Parsing experiments are evaluated by UDPipe in terms of unlabeled and labeled attachment scores: (UAS) and (LAS), which are 93.20%, and 91.21% in test experiment respectively.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124457396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Acoustic Features Of Mandarin Diphthongs By Uyghur Learners At Primary Level 维吾尔语初级学习者普通话双元音的声学特征
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693014
Yultuz Rapkat, Gulnur Arkin, Mijit Ablimit, A. Hamdulla
{"title":"Acoustic Features Of Mandarin Diphthongs By Uyghur Learners At Primary Level","authors":"Yultuz Rapkat, Gulnur Arkin, Mijit Ablimit, A. Hamdulla","doi":"10.1109/ICSDA.2018.8693014","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693014","url":null,"abstract":"From the perspective of experimental phonetics, this paper makes an acoustic comparison analysis of the diphthongs Uyghur and Chinese college speakers, and examines the situation of primary-level Uyghur learners’ acquisition of Chinese Mandarin diphthongs. A total of 132 samples (including 9 diphthongs) are extracted from the recorded corpus, and the formants of the vowel are statistically analyzed. The characteristics and the distributions of the formants are analyzed to investigate the acoustic characteristics. Finally, combined with the experimental results, the Uyghur learners’ at primary level acquisition of diphthongs will be further discussed and analyzed. The purpose of this paper is to understand the Uyghur college learners’ acquisition of Chinese Mandarin diphthongs tracks and to provide the correct reference data for the Computer Assisted Language Learning System of Uyghur Learning Chinese Mandarin.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"35 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133715545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonetic Realization Of Information Structures In Chinese English Learners’ Reading Texts 中国英语学习者阅读篇章中信息结构的语音实现
2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-01 DOI: 10.1109/ICSDA.2018.8693006
Xinyi Wen, Yuan Jia, Ai-jun Li
{"title":"Phonetic Realization Of Information Structures In Chinese English Learners’ Reading Texts","authors":"Xinyi Wen, Yuan Jia, Ai-jun Li","doi":"10.1109/ICSDA.2018.8693006","DOIUrl":"https://doi.org/10.1109/ICSDA.2018.8693006","url":null,"abstract":"The present study aims to investigate the phonetic realization of information structure in L2, by comparing the productions of English discourse from Beijing English learners and from native English speakers. Phonetic and statistical analyses are conducted on English reading texts selected from Asian English Speech cOrpus Project (AESOP). The main findings include: Beijing English learners do not distinguish the given and new information with pitch range as native English speakers do, which is the main difference between the two speaker groups; the slight differences found on duration and mean pitch value might result from other factors rather than phonetic strategies utilized in information packaging. Besides, the difference between Beijing English learners' performance in lexical and referential levels mainly lies in the duration of accessible information.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133898229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信