2009 Oriental COCOSDA International Conference on Speech Database and Assessments最新文献

筛选
英文 中文
Message from the Oriental-COCOSDA convener 东方cocosda召集人寄语
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085965
Chiu-yu Tseng
{"title":"Message from the Oriental-COCOSDA convener","authors":"Chiu-yu Tseng","doi":"10.1109/ICSDA.2011.6085965","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085965","url":null,"abstract":"Welcome to Oriental-COCOSDA 2011 at Hsinchu, Taiwan. This is the 14th annual conference of Oriental-COCOSDA, the Oriental chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. From 1998, annual meetings have been held at Tsukuba, Taipei, Beijing, Jeju, Huahin, Singapore, Delhi, Jakarta, Penang, Hanoi, Beijing, Kyoto, Katmandu and this year Hsinchu. I would like to thank the colleagues from Taiwan and headed by Conference Chair Professor Hsiao-Chuan Wang for making the event possible this time in Taiwan.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An HMM-based Vietnamese speech synthesis system 基于hmm的越南语语音合成系统
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278366
T. Vu, Mai Chi Luong, Satoshi Nakamura
{"title":"An HMM-based Vietnamese speech synthesis system","authors":"T. Vu, Mai Chi Luong, Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278366","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278366","url":null,"abstract":"This paper describes an approach to the realization of a Vietnamese speech synthesis system applying a technique whereby speech is directly synthesized from Hidden Markov models (HMMs). Spectrum, pitch, and phone duration are simultaneously modeled in HMMs and their parameter distributions are clustered independently by using decision tree-based context clustering algorithms. Several contextual factors such as tone types, syllables, words, phrases, and utterances were determined and are taken into account to generate the spectrum, pitch, and state duration. The resulting system yields significant correctness for a tonal language, and a fair reproduction of the prosody.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124794920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A chain of Gaussian Mixture Model for text-independent speaker recognition 基于链高斯混合模型的文本无关说话人识别
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278367
Yanxiang Chen, Ming Liu
{"title":"A chain of Gaussian Mixture Model for text-independent speaker recognition","authors":"Yanxiang Chen, Ming Liu","doi":"10.1109/ICSDA.2009.5278367","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278367","url":null,"abstract":"Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116829765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An undergraduate Mandarin speech database for speaker recognition research 面向说话人识别研究的大学生普通话语音数据库
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278370
Hong Wang, Jingui Pan
{"title":"An undergraduate Mandarin speech database for speaker recognition research","authors":"Hong Wang, Jingui Pan","doi":"10.1109/ICSDA.2009.5278370","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278370","url":null,"abstract":"This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"8 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133042016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Research on Uyghur framenet description system 维吾尔语框架描述系统研究
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278358
Alifu Kuerban, Wumaierjiang Kuerban, Nijat Abdurusul
{"title":"Research on Uyghur framenet description system","authors":"Alifu Kuerban, Wumaierjiang Kuerban, Nijat Abdurusul","doi":"10.1109/ICSDA.2009.5278358","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278358","url":null,"abstract":"This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130134091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An investigation on the Mandarin prosody of a parallel multi-speaking rate speech corpus 平行多语速语料库的汉语韵律研究
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278360
Chen-Yu Chiang, C. Tang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen
{"title":"An investigation on the Mandarin prosody of a parallel multi-speaking rate speech corpus","authors":"Chen-Yu Chiang, C. Tang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ICSDA.2009.5278360","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278360","url":null,"abstract":"In this paper, the prosody of a parallel multi-speaking rate Mandarin read speech corpus is investigated. The corpus contains four parallel speech datasets uttered by a female professional announcer with various speech rates (SRs) of 4.40 (fast), 3.82 (normal), 2.97 (median) and 2.45 (slow) syllables/second. By using the unsupervised joint prosody labeling and modeling (PLM) method proposed previously, the relationship between SR and various prosodic features, including pause duration, patterns of three high-level prosodic constituents, and the break labels, are investigated. The analyses reported in this study could be very informative in developing prosody generation mechanism for text-to-speech and prosody modeling for automatic speech recognition in various SRs.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131701566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Modeling characteristics of agglutinative languages with Multi-class language model for ASR system 基于多类语言模型的ASR系统黏着语言建模特征
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278368
I. Dawa, Y. Sagisaka, S. Nakamura
{"title":"Modeling characteristics of agglutinative languages with Multi-class language model for ASR system","authors":"I. Dawa, Y. Sagisaka, S. Nakamura","doi":"10.1109/ICSDA.2009.5278368","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278368","url":null,"abstract":"In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122588323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Emphasized speech synthesis based on hidden Markov models 重点介绍了基于隐马尔可夫模型的语音合成
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278371
Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano
{"title":"Emphasized speech synthesis based on hidden Markov models","authors":"Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano","doi":"10.1109/ICSDA.2009.5278371","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278371","url":null,"abstract":"This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing an emphasized accentual phrase are additionally considered in model training. Moreover, to build HMMs for synthesizing both normal speech and emphasized speech, we investigate two training methods; one is training of individual models for normal and emphasized speech using each of these two types of speech data separately; and the other is training of a mixed model using both of them simultaneously. The experimental results demonstrate that 1) HMM-based speech synthesis is effective for synthesizing emphasized speech and 2) the mixed model allows a more compact HMM set generating more naturally sounding but slightly less emphasized speech compared with the individual models.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131706758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Acoustic manifestations of information categories in Standard Chinese 标准汉语信息类的声学表现
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278372
Yuan Jia, Ai-jun Li
{"title":"Acoustic manifestations of information categories in Standard Chinese","authors":"Yuan Jia, Ai-jun Li","doi":"10.1109/ICSDA.2009.5278372","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278372","url":null,"abstract":"The present study mainly investigates the acoustic manifestations of various information categories in Standard Chinese (hereinafter, SC). Results of experiments have demonstrated that rheme focus, theme focus, rheme background and theme background can be reflected by different acoustic realizations. Specifically, rheme focus and theme focus can induce F0 and duration prominences, and the former exerts more obvious variations. Although rheme background and theme background introduce no prominences, the former can be manifested by greater magnitude of acoustic performances than the latter.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129918903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward translating Indonesian spoken utterances to/from other languages 将印尼语翻译成其他语言
2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278362
S. Sakti, Michael Paul, R. Maia, S. Sakai, Noriyuki Kimura, Yutaka Ashikari, E. Sumita, Satoshi Nakamura
{"title":"Toward translating Indonesian spoken utterances to/from other languages","authors":"S. Sakti, Michael Paul, R. Maia, S. Sakai, Noriyuki Kimura, Yutaka Ashikari, E. Sumita, Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278362","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278362","url":null,"abstract":"This paper outlines the National Institute of Information and Communications Technology / Advanced Telecommunications Research Institute International (NICT/ATR) research activities in developing a spoken language translation system, specially for translating Indonesian spoken utterances into/from Japanese or English. Since the NICT/ATR Japanese-English speech translation system is an established one and has been widely known for many years, our focus here is only on the additional components that are related to the Indonesian spoken language technology. This includes the development of an Indonesian large vocabulary continuous speech recognizer, Indonesian-Japanese and Indonesian-English machine translators, and an Indonesian speech synthesizer. Each of these component technologies was developed by using corpus-based speech and language processing approaches. Currently, all these components have been successfully incorporated into the mobile terminal of the NICT/ATR multilingual speech translation system.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121855907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信