2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)最新文献

筛选
英文 中文
Development of Hindi mobile communication text and speech corpus 印地语移动通信文本和语音语料库的开发
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085975
S. Sinha, S. Agrawal, Jesper Ø. Olsen
{"title":"Development of Hindi mobile communication text and speech corpus","authors":"S. Sinha, S. Agrawal, Jesper Ø. Olsen","doi":"10.1109/ICSDA.2011.6085975","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085975","url":null,"abstract":"This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132701330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Assessing the naturalness of malay emotional voice corpora 马来语情感语料库的自然度评估
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6086002
Mumtaz Begum Mustafa, R. N. Ainon, R. Zainuddin, Z. M. Don, G. Knowles
{"title":"Assessing the naturalness of malay emotional voice corpora","authors":"Mumtaz Begum Mustafa, R. N. Ainon, R. Zainuddin, Z. M. Don, G. Knowles","doi":"10.1109/ICSDA.2011.6086002","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6086002","url":null,"abstract":"This research reports the development and evaluation of Malay emotional voice corpora through listening evaluation, and how the numbers of emotion choices offered to evaluators affect the result of the evaluation. The voice corpora comprises of three emotions, namely anger, sadness and happiness being expressed by two male and two female actors. The voice corpora were evaluated in two separate listening tests involving a number of Malay native evaluators balanced for gender, age and profession. In the first listening test, evaluators were given twenty five choices of emotions to choose from. For the second test, the number of emotion choices is only five. Each test was conducted separately with different group of evaluators. The results of the two tests are grossly different with the emotion identification rate of the first test lower than the second test.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The influence of Shandong dialects on the acquisition of English plosives 山东方言对英语爆破音习得的影响
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085984
Yuan Jia, Xia Wang, Ai-jun Li
{"title":"The influence of Shandong dialects on the acquisition of English plosives","authors":"Yuan Jia, Xia Wang, Ai-jun Li","doi":"10.1109/ICSDA.2011.6085984","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085984","url":null,"abstract":"The present study adopts acoustic means to investigate the articulatory problems from Shandong (Hereinafter, SD) learners on the production of English plosives. The VOT, pitch, and formants were selected as the parameters to examine the manner and place features of the plosives produced by the SD learners. Results demonstrate that the SD learners pronounce the voiced stops as voiceless ones. This result is due to the negative transfer from SD dialect which was proposed to contain no voiced stops. Further, the SD speakers also exit problem in the aspiration and tongue position during the articulation of [d, g, t′, k′], and this result lies in the positive influence from SD dialect.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128228372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Acoustic Parameter Databases of Dagur, Evenki, Oroqen nationalities 达古尔族、鄂温克族、鄂伦春族声学参数数据库
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085993
Hu He, Xuewen Zhou, Wu Ri Ge, Xi Le Tu, M. Ge, Zheng Yuling
{"title":"Acoustic Parameter Databases of Dagur, Evenki, Oroqen nationalities","authors":"Hu He, Xuewen Zhou, Wu Ri Ge, Xi Le Tu, M. Ge, Zheng Yuling","doi":"10.1109/ICSDA.2011.6085993","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085993","url":null,"abstract":"Building Unified Acoustic Parameter Databases of Minority Languages in China is a pioneering work, which could promote standardization and computerization of minority phonetics, provide scientific evidences for speech education, speech recognition & speech synthesis, protect weak and endangered languages with modern and scientific means, ensure resources sharing and continuance of phonetic research. After establishing databases of Tibetan, Uigur and Yi languages, we expanded the databases with three endangered minority languages of Dagur, Evenki, and Oroqen. In the process of building Acoustic Parameter Databases we improved rules and approaches of measuring acoustic parameter and made some research of endangered languages such as their prosodic pattern features and frequently-used flap phoneme.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The development of a database of functional and emotional intonation in Chinese 汉语功能语调和情感语调数据库的开发
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085995
Maolin Wang, Yingjun Li, M. Lin, Ai-jun Li, Ziyu Xiong
{"title":"The development of a database of functional and emotional intonation in Chinese","authors":"Maolin Wang, Yingjun Li, M. Lin, Ai-jun Li, Ziyu Xiong","doi":"10.1109/ICSDA.2011.6085995","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085995","url":null,"abstract":"A speech database is a very important resource for speech processing research. In this paper, the design and development of a database of functional and emotional intonation in Chinese (DFEIC) is described. The database is based on conversations from movies and TV plays of about 110 hours. Utterances are segmented and syllable and prosody labeling is done. Functions like statements or questions, etc., as well as emotions like happiness or anger, etc. are also labeled. This database will be applicable for functional and emotional intonation studying, and it will also be useful for functional and emotional recognition.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133646736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design and creation of Dysarthric Speech Database for development of QoLT software technology 面向QoLT软件技术开发的困难语音数据库的设计与创建
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085978
Dae-Lim Choi, Bong-Wan Kim, Yeon-Whoa Kim, Yong-Ju Lee, Yongnam Um, Minhwa Chung
{"title":"Design and creation of Dysarthric Speech Database for development of QoLT software technology","authors":"Dae-Lim Choi, Bong-Wan Kim, Yeon-Whoa Kim, Yong-Ju Lee, Yongnam Um, Minhwa Chung","doi":"10.1109/ICSDA.2011.6085978","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085978","url":null,"abstract":"In this paper we will introduce the work of creation of a speech database to develop speech technology for disabled persons, which has been done as part of a national program to help better life for Korean people. We will report about the creation of speech database of a total of 160 persons: prompting items, designs, etc. for the creation of a database which is needed to develop an embedded key-word spotting speech recognition system tailored for the persons disabled in articulation. The created database is being used by the technology development team in the national program to study the phonetic characteristics of the different types of disabled persons, develop the automatic method to assess degrees of disability, investigate the phonetic features of speech of the disabled, and design and implement the software prototype for personal embedded speech recognition systems adapted to the disabled persons.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129776957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Unsupervised phone segmentation method using delta spectral function 基于δ谱函数的无监督电话分割方法
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085998
Dac-Thang Hoang, Hsiao-Chuan Wang
{"title":"Unsupervised phone segmentation method using delta spectral function","authors":"Dac-Thang Hoang, Hsiao-Chuan Wang","doi":"10.1109/ICSDA.2011.6085998","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085998","url":null,"abstract":"Unsupervised phone segmentation means that the phone boundaries in an utterance can be detected without a prior knowledge about the text contents. Usually, a spectral change in the speech signal implies the existence of a phone boundary. In this paper, the Delta Spectral Function (DSF) is defined for each frame to represent the variation of band energy for a specific band. Then a number of bands that give highest DSF values in a frame are chosen to define a measure of spectral change. The chosen bands are not fixed. They are dynamically chosen frame by frame. The peaks of the spectral change curve can be recognized as possible boundaries. A fine tune procedure is then applied to choose the peaks that will be the detected boundaries. Our proposed method results in an F-value of 75.3% under the condition of near zero over segmentation. In this situation the recall rate is 75.3%. This experimental result is better than many previous reports. Besides, the computation is simple and the proposed method is easy to be implemented.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129934468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mongolian speech corpus for text-to-speech development 用于文本到语音发展的蒙古语语料库
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085994
C. Hansakunbuntheung, A. Thangthai, N. Thatphithakkul, Altangerel Chagnaa
{"title":"Mongolian speech corpus for text-to-speech development","authors":"C. Hansakunbuntheung, A. Thangthai, N. Thatphithakkul, Altangerel Chagnaa","doi":"10.1109/ICSDA.2011.6085994","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085994","url":null,"abstract":"This paper presents a first attempt to develop Mongolian speech corpus that designed for data-driven speech synthesis in Mongolia. The aim of the speech corpus is to develop a high-quality Mongolian TTS for blinds to use with screen reader. The speech corpus contains nearly 6 hours of Mongolian phones. It well provides Cyrillic text transcription and its phonetic transcription with stress marking. It also provides context information including phone context, stressing levels, syntactic position in word, phrase and utterance for modeling speech acoustics and characteristics for speech synthesis.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A comparative study on accentuation implementation of Chinese EFL learners vs. American native speakers 中国英语学习者与美国英语母语者重音实施的比较研究
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085981
Xia Wang, Yuan Jia, Ai-jun Li
{"title":"A comparative study on accentuation implementation of Chinese EFL learners vs. American native speakers","authors":"Xia Wang, Yuan Jia, Ai-jun Li","doi":"10.1109/ICSDA.2011.6085981","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085981","url":null,"abstract":"This paper investigates how Chinese EFL (English as foreign language) learners produce accentuation when speaking English. The study focuses on prosodic research of Chinese EFL Learners' English vs. native English through comparative evaluation of phonological pattern and accent related prosodic parameters. The research results show that the average length of intermediate phrases and intonational phrase is smaller in Chinese EFL learners' English than that in native English; the better the Chinese learner's English is, the closer the partition of intermediate/intonational phrases and the accent pattern are in his/her speech to those of native speakers; Chinese speakers tend to use pitch range amplification mechanism to realize accentuation rather than durational lengthening due to the negative language transfer from their native language, Chinese.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133303600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition 基于语素拼接的大词汇量维吾尔语语音识别语言建模
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085990
Mijit Ablimit, A. Hamdulla, Tatsuya Kawahara
{"title":"Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition","authors":"Mijit Ablimit, A. Hamdulla, Tatsuya Kawahara","doi":"10.1109/ICSDA.2011.6085990","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085990","url":null,"abstract":"For large-vocabulary continuous speech recognition (LVCSR) of highly-inflected languages, selection of an appropriate recognition unit is the first important step. The morpheme-based approach is often adopted because of its high coverage and linguistic properties. But morpheme units are short, often consisting of one or two phonemes, thus they are more likely to be confused in ASR than word units. Generally, word units provide better linguistic constraint, but increases the vocabulary size explosively, causing OOV (out-of-vocabulary) and data sparseness problems in language modeling. In this research, we investigate approaches of selecting word entries by concatenating morpheme sequences, which would reduce word error rate (WER). Specifically, we compare the ASR results of the word-based model and those of the morpheme-based model, and extract typical patterns which would reduce the WER. This method has been successfully applied to an Uyghur LVCSR system, resulting in a significant reduction of WER without a drastic increase of the vocabulary size.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1552 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133818206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信