T. Visceglia, Chiu-yu Tseng, M. Kondo, H. Meng, Y. Sagisaka
{"title":"Phonetic aspects of content design in AESOP (Asian English Speech cOrpus Project)","authors":"T. Visceglia, Chiu-yu Tseng, M. Kondo, H. Meng, Y. Sagisaka","doi":"10.1109/ICSDA.2009.5278376","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278376","url":null,"abstract":"This research is part of the ongoing multinational collaboration “Asian English Speech cOrpus Project” (AESOP), whose aim is to build up an Asian English speech corpus representing the varieties of English spoken in Asia. AESOP is an international consortium of linguists, speech scientists, psychologists and educators from Japan, Taiwan, Hong Kong, China, Thailand, Indonesia and Mongolia. Its primary aim is to collect and compare Asian English speech corpora from the countries listed above in order to derive a set of core properties common to all varieties of Asian English, as well as to discover features that are particular to individual varieties. Each research team will use a common recording setup and share an experimental task set, and will develop a common, open-ended annotation system. Moreover, AESOP-collected corpora will be an open resource, available to the research community at large. The initial stage of the phonetics aspect of this project will be devoted to designing spoken-language tasks which will elicit production of a large range of English segmental and suprasegmental characteristics. These data will be used to generate a catalogue of acoustic characteristics particular to individual varieties of Asian English, which will then be compared with the data collected by other AESOP members in order to determine areas of overlap between L1 and L2 English as well as differences among varieties of Asian English.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131419125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development and application of multilingual speech translation","authors":"Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278383","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278383","url":null,"abstract":"This paper describes the latest version of handheld speech-to-speech translation system developed by National Institute of Information and Communications Technology, NICT. As the entire speech-to-speech translation functions are implemented into one terminal, it realizes real-time and location free speech-to-speech translation service for many language pairs. A new noise-suppression technique notably improves speech recognition performance. Corpus-based approaches of recognition, translation, and synthesis enabled wide range coverage of topic varieties and portability to other languages. Currently, we mainly focus on translation between Japanese, English and Chinese.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121791061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances in Chinese Natural Language Processing and Language resources","authors":"J. Tao, Fang Zheng, Ai-jun Li, Ya Li","doi":"10.1109/ICSDA.2009.5278384","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278384","url":null,"abstract":"In the past few years, there have been a significant number of activities in the area of Chinese Natural Language Processing (CNLP) including the language resource construction and assessment. This paper summarized the major tasks and key technologies in Natural Language Processing (NLP), which encompasses both text processing and speech processing by extension. The Chinese Language resources, including linguistic data, speech data, evaluation data and language toolkits which are elaborately constructed for CNLP related fields and some language resource consortiums are also introduced in this paper. Aimed to promote the development of corpus-based technologies, many resource consortiums commit themselves to collect, create and distribute many kinds of resources. The goal of these organizations is to set up a universal and well accepted Chinese resources database so that to push forward the CNLP.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125962213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Sagisaka, H. Kato, M. Tsuzaki, Shizuka Nakamura, C. Hansakunbuntheung
{"title":"Speech timing and cross-linguistic studies towards computational human modeling","authors":"Y. Sagisaka, H. Kato, M. Tsuzaki, Shizuka Nakamura, C. Hansakunbuntheung","doi":"10.1109/ICSDA.2009.5278386","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278386","url":null,"abstract":"In this paper, we introduce Japanese segmental duration characteristics and computational modeling that we have been studying for around three decades in speech synthesis. A series of experimental results are also shown on loudness dependence in the duration perception. These computational duration modeling and perceptual studies on duration error sensitivity to loudness give some insights for computational human modeling of spoken language capability. As a first trial to figure out how these findings could be efficiently employed in other field like language learning, we introduce our current efforts on the objective evaluation of 2nd language speaking skill and the research consortium of AESOP (Asian English Speech cOrpus Project) where researchers in Asian countries have started to work together.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124376438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intonation patterns of yes-no questions for Chinese EFL learners","authors":"Xiaoli Ji, Xia Wang, Ai-jun Li","doi":"10.1109/ICSDA.2009.5278369","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278369","url":null,"abstract":"The present study investigates Chinese EFL (English as a foreign language) learners' intonation pattern of yes-no questions on the basis of AM theory. According to our study, American speakers adopt a low-level (L*) or low rising tone (L*H) on nuclear accents no matter the nuclear accent is on the medial or final part of a sentence. By contrast, Chinese EFL learners apply a high-level (H*) or falling (H*L) tone when a nuclear accent falls on the medial part of a sentence but a falling (H*L) or low rising tone (L*H) when it is on the final part. The final boundary tone of Chinese EFL learners can be either high (H%) or low (L%) while American speakers mainly apply the H% boundary tone. Besides, Chinese EFL learners' pitch movements of nuclear accents in yes-no questions are similar to those of statements.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"53 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131751773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction of Chinese conversational corpora for spontaneous speech recognition and comparative study on the trilingual parallel corpora","authors":"Xinhui Hu, R. Isotani, Satoshi Nakamura","doi":"10.1109/ICSDA.2009.5278375","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278375","url":null,"abstract":"In this paper, we describe the development of Chinese conversational segmented and POS-tagged corpora currently used in the NICT/ATR speech-to-speech translation system. Over 500K manually checked utterances provide 3.5M words of Chinese corpora. As far as we know, they are the largest conversational textual corpora; in the domain of travel. A set of three parallel corpora is obtained with the corresponding pairs of Japanese and English words from which the Chinese words are translated. Based on these parallel corpora, we make an investigation on the statistics of each language, performances of language model and speech recognition, and find the differences among these languages. The problems and their solutions to the present Chinese corpora are also analyzed and discussed.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"1999 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132337610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joyanta Basu, T. Basu, Mridusmita Mitra, Shyamal Kr. Das Mandal
{"title":"Grapheme to Phoneme (G2P) conversion for Bangla","authors":"Joyanta Basu, T. Basu, Mridusmita Mitra, Shyamal Kr. Das Mandal","doi":"10.1109/ICSDA.2009.5278373","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278373","url":null,"abstract":"The automatic conversion of text to phoneme is a necessary step in all-current approaches to Text-to-Speech (TTS) synthesis and Automatic Speech Recognition System. This paper presents a methodology for Grapheme to Phoneme (G2P) conversion for Bangla based on orthographic rules. In Bangla G2P conversion sometimes depends not only on orthographic information but also on Parts of Speech (POS) information and semantics. This paper also addresses these issues along with their implementation methodology. The G2P conversion system of Bangla is tested on 1000 different types of Bangla sentences containing 9294 words. The percentage of correct conversion is 91.58% without considering the semantics and contextual POS with the exception table size of 333 words. If those errors which occur due to lack of exceptional words are considered, then the percentage of correct conversion will increase to 98%.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech processing technology of Uyghur language","authors":"Wushouer Silamu, Nasirjan Tursun, Parida Saltiniyaz","doi":"10.1109/ICSDA.2009.5278381","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278381","url":null,"abstract":"In recent years, there have been a significant number of activities in the area of Uyghur speech processing. This paper summarized the major tasks and key technologies in Uyghur speech processing, which including the speech database, continuous speech recognition and speech synthesis. Uyghur language is one of the least studied languages on speech processing area. For this reason, in our work, the first step was collecting large amount continuous speech data of Uyghur language. In Uyghur continuous speech recognition, we were building the HMM state models for each recognition unit, and were using the recognizer of HTK3.3 (HMM ToolKit) and the MS Visual C++8.0 developing the basic Uyghur Continuous Speech Recognition System. In Uyghur speech synthesis, we were designing and developing an intelligible and natural sounding corpus-based speech synthesis system.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125172567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agha Ali Raza, S. Hussain, Huda Sarfraz, Inam Ullah, Z. Sarfraz
{"title":"Design and development of phonetically rich Urdu speech corpus","authors":"Agha Ali Raza, S. Hussain, Huda Sarfraz, Inam Ullah, Z. Sarfraz","doi":"10.1109/ICSDA.2009.5278380","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278380","url":null,"abstract":"Phonetically rich speech corpora play a pivotal role in speech research. The significance of such resources becomes crucial in the development of Automatic Speech Recognition systems and Text to Speech systems. This paper presents details of designing and developing an optimal context based phonetically rich speech corpus for Urdu that will serve as a baseline model for training a Large Vocabulary Continuous Speech Recognition system for Urdu language.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"201202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116486887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Po-Yi Shih, Jhing-Fa Wang, Yuan-Ning Lin, Zhonghua Fu
{"title":"Multi-speaker adaptation for robust speech recognition under ubiquitous environment","authors":"Po-Yi Shih, Jhing-Fa Wang, Yuan-Ning Lin, Zhonghua Fu","doi":"10.1109/ICSDA.2009.5278364","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278364","url":null,"abstract":"This paper presents a multi-speaker adaptation for robust speech recognition under ubiquitous environment. The goal is to adapt the speech recognition model for each speaker correctly in ubiquitous multi-speaker environment. We integrate speaker recognition and unsupervised speaker adaptation method to promote the speech recognition performances. Specifically we employ a confidence measure to reduce the possible negative adaptation caused by the environment noise or the recognition errors. The experimental results show that the proposed framework can efficiently promote the average recognition accuracy to 80∼90% for multi-speaker ubiquitous speech recognition.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127721234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}