{"title":"Message from the Oriental-COCOSDA convener","authors":"Chiu-yu Tseng","doi":"10.1109/ICSDA.2011.6085965","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085965","url":null,"abstract":"Welcome to Oriental-COCOSDA 2011 at Hsinchu, Taiwan. This is the 14th annual conference of Oriental-COCOSDA, the Oriental chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. From 1998, annual meetings have been held at Tsukuba, Taipei, Beijing, Jeju, Huahin, Singapore, Delhi, Jakarta, Penang, Hanoi, Beijing, Kyoto, Katmandu and this year Hsinchu. I would like to thank the colleagues from Taiwan and headed by Conference Chair Professor Hsiao-Chuan Wang for making the event possible this time in Taiwan.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An HMM-based Vietnamese speech synthesis system","authors":"T. Vu, Mai Chi Luong, Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278366","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278366","url":null,"abstract":"This paper describes an approach to the realization of a Vietnamese speech synthesis system applying a technique whereby speech is directly synthesized from Hidden Markov models (HMMs). Spectrum, pitch, and phone duration are simultaneously modeled in HMMs and their parameter distributions are clustered independently by using decision tree-based context clustering algorithms. Several contextual factors such as tone types, syllables, words, phrases, and utterances were determined and are taken into account to generate the spectrum, pitch, and state duration. The resulting system yields significant correctness for a tonal language, and a fair reproduction of the prosody.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124794920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A chain of Gaussian Mixture Model for text-independent speaker recognition","authors":"Yanxiang Chen, Ming Liu","doi":"10.1109/ICSDA.2009.5278367","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278367","url":null,"abstract":"Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116829765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An undergraduate Mandarin speech database for speaker recognition research","authors":"Hong Wang, Jingui Pan","doi":"10.1109/ICSDA.2009.5278370","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278370","url":null,"abstract":"This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"8 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133042016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Uyghur framenet description system","authors":"Alifu Kuerban, Wumaierjiang Kuerban, Nijat Abdurusul","doi":"10.1109/ICSDA.2009.5278358","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278358","url":null,"abstract":"This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130134091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen-Yu Chiang, C. Tang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen
{"title":"An investigation on the Mandarin prosody of a parallel multi-speaking rate speech corpus","authors":"Chen-Yu Chiang, C. Tang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ICSDA.2009.5278360","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278360","url":null,"abstract":"In this paper, the prosody of a parallel multi-speaking rate Mandarin read speech corpus is investigated. The corpus contains four parallel speech datasets uttered by a female professional announcer with various speech rates (SRs) of 4.40 (fast), 3.82 (normal), 2.97 (median) and 2.45 (slow) syllables/second. By using the unsupervised joint prosody labeling and modeling (PLM) method proposed previously, the relationship between SR and various prosodic features, including pause duration, patterns of three high-level prosodic constituents, and the break labels, are investigated. The analyses reported in this study could be very informative in developing prosody generation mechanism for text-to-speech and prosody modeling for automatic speech recognition in various SRs.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131701566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling characteristics of agglutinative languages with Multi-class language model for ASR system","authors":"I. Dawa, Y. Sagisaka, S. Nakamura","doi":"10.1109/ICSDA.2009.5278368","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278368","url":null,"abstract":"In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122588323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano
{"title":"Emphasized speech synthesis based on hidden Markov models","authors":"Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano","doi":"10.1109/ICSDA.2009.5278371","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278371","url":null,"abstract":"This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing an emphasized accentual phrase are additionally considered in model training. Moreover, to build HMMs for synthesizing both normal speech and emphasized speech, we investigate two training methods; one is training of individual models for normal and emphasized speech using each of these two types of speech data separately; and the other is training of a mixed model using both of them simultaneously. The experimental results demonstrate that 1) HMM-based speech synthesis is effective for synthesizing emphasized speech and 2) the mixed model allows a more compact HMM set generating more naturally sounding but slightly less emphasized speech compared with the individual models.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131706758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic manifestations of information categories in Standard Chinese","authors":"Yuan Jia, Ai-jun Li","doi":"10.1109/ICSDA.2009.5278372","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278372","url":null,"abstract":"The present study mainly investigates the acoustic manifestations of various information categories in Standard Chinese (hereinafter, SC). Results of experiments have demonstrated that rheme focus, theme focus, rheme background and theme background can be reflected by different acoustic realizations. Specifically, rheme focus and theme focus can induce F0 and duration prominences, and the former exerts more obvious variations. Although rheme background and theme background introduce no prominences, the former can be manifested by greater magnitude of acoustic performances than the latter.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129918903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sakti, Michael Paul, R. Maia, S. Sakai, Noriyuki Kimura, Yutaka Ashikari, E. Sumita, Satoshi Nakamura
{"title":"Toward translating Indonesian spoken utterances to/from other languages","authors":"S. Sakti, Michael Paul, R. Maia, S. Sakai, Noriyuki Kimura, Yutaka Ashikari, E. Sumita, Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278362","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278362","url":null,"abstract":"This paper outlines the National Institute of Information and Communications Technology / Advanced Telecommunications Research Institute International (NICT/ATR) research activities in developing a spoken language translation system, specially for translating Indonesian spoken utterances into/from Japanese or English. Since the NICT/ATR Japanese-English speech translation system is an established one and has been widely known for many years, our focus here is only on the additional components that are related to the Indonesian spoken language technology. This includes the development of an Indonesian large vocabulary continuous speech recognizer, Indonesian-Japanese and Indonesian-English machine translators, and an Indonesian speech synthesizer. Each of these component technologies was developed by using corpus-based speech and language processing approaches. Currently, all these components have been successfully incorporated into the mobile terminal of the NICT/ATR multilingual speech translation system.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121855907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}