{"title":"Development of Hindi mobile communication text and speech corpus","authors":"S. Sinha, S. Agrawal, Jesper Ø. Olsen","doi":"10.1109/ICSDA.2011.6085975","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085975","url":null,"abstract":"This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132701330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mumtaz Begum Mustafa, R. N. Ainon, R. Zainuddin, Z. M. Don, G. Knowles
{"title":"Assessing the naturalness of malay emotional voice corpora","authors":"Mumtaz Begum Mustafa, R. N. Ainon, R. Zainuddin, Z. M. Don, G. Knowles","doi":"10.1109/ICSDA.2011.6086002","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6086002","url":null,"abstract":"This research reports the development and evaluation of Malay emotional voice corpora through listening evaluation, and how the numbers of emotion choices offered to evaluators affect the result of the evaluation. The voice corpora comprises of three emotions, namely anger, sadness and happiness being expressed by two male and two female actors. The voice corpora were evaluated in two separate listening tests involving a number of Malay native evaluators balanced for gender, age and profession. In the first listening test, evaluators were given twenty five choices of emotions to choose from. For the second test, the number of emotion choices is only five. Each test was conducted separately with different group of evaluators. The results of the two tests are grossly different with the emotion identification rate of the first test lower than the second test.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The influence of Shandong dialects on the acquisition of English plosives","authors":"Yuan Jia, Xia Wang, Ai-jun Li","doi":"10.1109/ICSDA.2011.6085984","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085984","url":null,"abstract":"The present study adopts acoustic means to investigate the articulatory problems from Shandong (Hereinafter, SD) learners on the production of English plosives. The VOT, pitch, and formants were selected as the parameters to examine the manner and place features of the plosives produced by the SD learners. Results demonstrate that the SD learners pronounce the voiced stops as voiceless ones. This result is due to the negative transfer from SD dialect which was proposed to contain no voiced stops. Further, the SD speakers also exit problem in the aspiration and tongue position during the articulation of [d, g, t′, k′], and this result lies in the positive influence from SD dialect.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128228372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hu He, Xuewen Zhou, Wu Ri Ge, Xi Le Tu, M. Ge, Zheng Yuling
{"title":"Acoustic Parameter Databases of Dagur, Evenki, Oroqen nationalities","authors":"Hu He, Xuewen Zhou, Wu Ri Ge, Xi Le Tu, M. Ge, Zheng Yuling","doi":"10.1109/ICSDA.2011.6085993","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085993","url":null,"abstract":"Building Unified Acoustic Parameter Databases of Minority Languages in China is a pioneering work, which could promote standardization and computerization of minority phonetics, provide scientific evidences for speech education, speech recognition & speech synthesis, protect weak and endangered languages with modern and scientific means, ensure resources sharing and continuance of phonetic research. After establishing databases of Tibetan, Uigur and Yi languages, we expanded the databases with three endangered minority languages of Dagur, Evenki, and Oroqen. In the process of building Acoustic Parameter Databases we improved rules and approaches of measuring acoustic parameter and made some research of endangered languages such as their prosodic pattern features and frequently-used flap phoneme.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maolin Wang, Yingjun Li, M. Lin, Ai-jun Li, Ziyu Xiong
{"title":"The development of a database of functional and emotional intonation in Chinese","authors":"Maolin Wang, Yingjun Li, M. Lin, Ai-jun Li, Ziyu Xiong","doi":"10.1109/ICSDA.2011.6085995","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085995","url":null,"abstract":"A speech database is a very important resource for speech processing research. In this paper, the design and development of a database of functional and emotional intonation in Chinese (DFEIC) is described. The database is based on conversations from movies and TV plays of about 110 hours. Utterances are segmented and syllable and prosody labeling is done. Functions like statements or questions, etc., as well as emotions like happiness or anger, etc. are also labeled. This database will be applicable for functional and emotional intonation studying, and it will also be useful for functional and emotional recognition.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133646736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dae-Lim Choi, Bong-Wan Kim, Yeon-Whoa Kim, Yong-Ju Lee, Yongnam Um, Minhwa Chung
{"title":"Design and creation of Dysarthric Speech Database for development of QoLT software technology","authors":"Dae-Lim Choi, Bong-Wan Kim, Yeon-Whoa Kim, Yong-Ju Lee, Yongnam Um, Minhwa Chung","doi":"10.1109/ICSDA.2011.6085978","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085978","url":null,"abstract":"In this paper we will introduce the work of creation of a speech database to develop speech technology for disabled persons, which has been done as part of a national program to help better life for Korean people. We will report about the creation of speech database of a total of 160 persons: prompting items, designs, etc. for the creation of a database which is needed to develop an embedded key-word spotting speech recognition system tailored for the persons disabled in articulation. The created database is being used by the technology development team in the national program to study the phonetic characteristics of the different types of disabled persons, develop the automatic method to assess degrees of disability, investigate the phonetic features of speech of the disabled, and design and implement the software prototype for personal embedded speech recognition systems adapted to the disabled persons.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129776957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised phone segmentation method using delta spectral function","authors":"Dac-Thang Hoang, Hsiao-Chuan Wang","doi":"10.1109/ICSDA.2011.6085998","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085998","url":null,"abstract":"Unsupervised phone segmentation means that the phone boundaries in an utterance can be detected without a prior knowledge about the text contents. Usually, a spectral change in the speech signal implies the existence of a phone boundary. In this paper, the Delta Spectral Function (DSF) is defined for each frame to represent the variation of band energy for a specific band. Then a number of bands that give highest DSF values in a frame are chosen to define a measure of spectral change. The chosen bands are not fixed. They are dynamically chosen frame by frame. The peaks of the spectral change curve can be recognized as possible boundaries. A fine tune procedure is then applied to choose the peaks that will be the detected boundaries. Our proposed method results in an F-value of 75.3% under the condition of near zero over segmentation. In this situation the recall rate is 75.3%. This experimental result is better than many previous reports. Besides, the computation is simple and the proposed method is easy to be implemented.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129934468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Hansakunbuntheung, A. Thangthai, N. Thatphithakkul, Altangerel Chagnaa
{"title":"Mongolian speech corpus for text-to-speech development","authors":"C. Hansakunbuntheung, A. Thangthai, N. Thatphithakkul, Altangerel Chagnaa","doi":"10.1109/ICSDA.2011.6085994","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085994","url":null,"abstract":"This paper presents a first attempt to develop Mongolian speech corpus that designed for data-driven speech synthesis in Mongolia. The aim of the speech corpus is to develop a high-quality Mongolian TTS for blinds to use with screen reader. The speech corpus contains nearly 6 hours of Mongolian phones. It well provides Cyrillic text transcription and its phonetic transcription with stress marking. It also provides context information including phone context, stressing levels, syntactic position in word, phrase and utterance for modeling speech acoustics and characteristics for speech synthesis.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative study on accentuation implementation of Chinese EFL learners vs. American native speakers","authors":"Xia Wang, Yuan Jia, Ai-jun Li","doi":"10.1109/ICSDA.2011.6085981","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085981","url":null,"abstract":"This paper investigates how Chinese EFL (English as foreign language) learners produce accentuation when speaking English. The study focuses on prosodic research of Chinese EFL Learners' English vs. native English through comparative evaluation of phonological pattern and accent related prosodic parameters. The research results show that the average length of intermediate phrases and intonational phrase is smaller in Chinese EFL learners' English than that in native English; the better the Chinese learner's English is, the closer the partition of intermediate/intonational phrases and the accent pattern are in his/her speech to those of native speakers; Chinese speakers tend to use pitch range amplification mechanism to realize accentuation rather than durational lengthening due to the negative language transfer from their native language, Chinese.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133303600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition","authors":"Mijit Ablimit, A. Hamdulla, Tatsuya Kawahara","doi":"10.1109/ICSDA.2011.6085990","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085990","url":null,"abstract":"For large-vocabulary continuous speech recognition (LVCSR) of highly-inflected languages, selection of an appropriate recognition unit is the first important step. The morpheme-based approach is often adopted because of its high coverage and linguistic properties. But morpheme units are short, often consisting of one or two phonemes, thus they are more likely to be confused in ASR than word units. Generally, word units provide better linguistic constraint, but increases the vocabulary size explosively, causing OOV (out-of-vocabulary) and data sparseness problems in language modeling. In this research, we investigate approaches of selecting word entries by concatenating morpheme sequences, which would reduce word error rate (WER). Specifically, we compare the ASR results of the word-based model and those of the morpheme-based model, and extract typical patterns which would reduce the WER. This method has been successfully applied to an Uyghur LVCSR system, resulting in a significant reduction of WER without a drastic increase of the vocabulary size.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1552 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133818206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}