2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)最新文献

筛选
英文 中文
The design and development of PELECAN: Pronunciation Errors from Learners of English Corpus and Annotation 英语语料库学习者的发音错误与标注的设计与开发
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085976
A. Chotimongkol, Sumonmas Thatphithakkul, P. Chootrakool, C. Hansakunbuntheung, C. Wutiwiwatchai
{"title":"The design and development of PELECAN: Pronunciation Errors from Learners of English Corpus and Annotation","authors":"A. Chotimongkol, Sumonmas Thatphithakkul, P. Chootrakool, C. Hansakunbuntheung, C. Wutiwiwatchai","doi":"10.1109/ICSDA.2011.6085976","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085976","url":null,"abstract":"This paper describes the design and construction of PELECAN (Pronunciation Errors from Learners of English Corpus and Annotation). PELECAN is created primarily for collecting pronunciation errors from Thai learners of English in order to develop a more suitable pronunciation assessment tool for Thais. A 2-phase data collection process is used to balance between recording effort and the coverage of interested acoustic phenomena. The data collected from the first phase contains 1.5 hours of speech from 30 Thai learners reading 2 English passages that cover all English phones. Recorded speech was annotated with 2 types of error annotation: phonetic transcription of incorrect pronunciation and level of correctness of each phone. A contrastive list was used to guide the error analysis process. We found that many pronunciation errors are influenced by L1 (Thai), e.g. incorrect pronunciations of suffixes and the deletion of /l/ and /r/ in consonant clusters. However, there are some errors that may not be predictable from contrastive analysis alone such as the case of schwa. Hence, the data driven approach could help identify errors that may not be foreseen from only a linguistic point of view.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1905 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133121719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Emotions in Hindi speech- analysis, perception and recognition 印地语言语中的情感——分析、感知和识别
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085972
S. Agrawal
{"title":"Emotions in Hindi speech- analysis, perception and recognition","authors":"S. Agrawal","doi":"10.1109/ICSDA.2011.6085972","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085972","url":null,"abstract":"Human Speech conveys speaker's emotional state along with linguistic intelligence. Meaning of a speech sample changes when it is uttered with different emotions. The present paper gives a description of different types of studies conducted to analyze, perceive and recognize commonly occurring emotions in Hindi speech. These have been classified as anger, happiness, fear, sadness, surprise in addition to neutral. Intonation, intensity and duration patterns changes due to changes in sentence types as well as due to changes in emotions. A relationship among the measured acoustic parameters and the patterns has been used to classify them. Experiments have been conducted to study and recognise emotions based on phonetic as well as prosodic parameters in the speech samples due to changes in emotions. These parameters include MFCC & their derivatives and prosodic parameters as the F0, A0 and Duration. In one of the experiment vowel segments taken from continuously spoken sentences and in another experiment Hindi digits were used as speech samples for machine recognition of emotions using the Neural Net classifiers. Human perception experiments have been conducted at all levels of experiments and compared the results with machine recognition performance. In most cases it has been found that machine recognition was found to be better compared to human performance. Both Phonetic as well as prosodic parameters play role in identification of emotions.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124673006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The role of speech technology in service-operation estimation 语音技术在业务运营评估中的作用
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085991
Masanori Takehara, S. Tamura, Ryuhei Tenmoku, T. Kurata, S. Hayamizu
{"title":"The role of speech technology in service-operation estimation","authors":"Masanori Takehara, S. Tamura, Ryuhei Tenmoku, T. Kurata, S. Hayamizu","doi":"10.1109/ICSDA.2011.6085991","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085991","url":null,"abstract":"This paper introduces our recent effort to develop a Service-Operation Estimation (SOE) system using speech and multi-sensored data as well as other acquired data. In SOE, it is essential to analyze employees' data in order to increase the productivity in many service industries. Speech processing techniques, such as voice activity detection and keyword spotting recognition, help the analysis and enhance the precision of the results; the beginning and end times of speech region are used to detect work events, and recognized keywords are used to conduct work estimation. In our system all the results are visualized in a 3D model, and it makes employers and employees help their operations.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124575874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Annotation of japanese response tokens and preliminary analysis on their distribution in three-party conversations 日语应答令牌标注及其在三方对话中的分布初探
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6086001
Yasuharu Den, Nao Yoshida, K. Takanashi, H. Koiso
{"title":"Annotation of japanese response tokens and preliminary analysis on their distribution in three-party conversations","authors":"Yasuharu Den, Nao Yoshida, K. Takanashi, H. Koiso","doi":"10.1109/ICSDA.2011.6086001","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6086001","url":null,"abstract":"In this paper, we propose a new annotation scheme for Japanese response tokens (RTs), which is based on strict and consistent procedures. Our scheme consists of two-stage annotation, in which RTs are first identified and classified according to their forms and then further sub-classified based on their sequential positions. Six forms are included in our class of RTs: i) responsive interjections, ii) expressive interjections, iii) lexical reactive expressions, iv) repetitions, v) completions, and vi) assessments. Some of them bear an additional tag according to their sequential position in the discourse: i) first pair parts, ii) second pair parts, iii) sequence-closing thirds, iv) other responding turns, and v) unclassifiable positions. We apply our scheme to annotate a Japanese three-party conversation corpus, and present the results of a preliminary analysis on the distribution of RTs in the corpus.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128476955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A question-and-answer classification technique for constructing and managing spoken dialog system 构建和管理口语对话系统的问答分类技术
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085987
Ryosuke Inoue, Y. Kurosawa, Kazuya Mera, T. Takezawa
{"title":"A question-and-answer classification technique for constructing and managing spoken dialog system","authors":"Ryosuke Inoue, Y. Kurosawa, Kazuya Mera, T. Takezawa","doi":"10.1109/ICSDA.2011.6085987","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085987","url":null,"abstract":"To recognize user speech accurately and respond to it appropriately, a spoken dialog system usually uses a question-and-answer database (QADB) which contains many question-and-answer pairs. The systems first select a question example which is the most similar to the recognition result for the input voice from the database. An answer sentence which is then paired with the selected question example is output to the user. Many systems have a large database to enable a more appropriate answer to be output. However, when such a database is used, the waiting time increases because the system needs to find the most appropriate question example from a vast number of question examples. We propose a method of classifying the queries in the QADB. By classifying question examples into some clusters using pLSA, an appropriate question example can be found more quickly than when using the conventional method. We evaluated the validity of our proposed method by changing various parameters.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126715513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A multimodal corpus for modeling turn management in multi-party conversations 用于多方对话中回合管理建模的多模态语料库
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085996
H. Furukawa, M. Nishida, Kristiina Jokinen, S. Yamamoto
{"title":"A multimodal corpus for modeling turn management in multi-party conversations","authors":"H. Furukawa, M. Nishida, Kristiina Jokinen, S. Yamamoto","doi":"10.1109/ICSDA.2011.6085996","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085996","url":null,"abstract":"Spoken interactions usually have accurate timing and alignment between interlocutors: turn-taking and topic flow are managed in a manner that provides conversational fluency and smooth progress of the interaction. Turn-taking and topic flow are also important in applications such as robot companions that interact with a user in real time. The creation of a multimodal conversational corpus for modeling turn management in multi-party conversations is described. The relation between the interlocutors' spoken utterances and eye-gaze based on the corpus is investigated.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"148 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130086298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic feature and variance of Uigur vowels 维吾尔语元音的声学特征及其变异
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6086003
Xuewen Zhou, He Hu, Wu Ri Ge, Xi Le Tu, Qi Mu Ge, Zheng Yuling
{"title":"Acoustic feature and variance of Uigur vowels","authors":"Xuewen Zhou, He Hu, Wu Ri Ge, Xi Le Tu, Qi Mu Ge, Zheng Yuling","doi":"10.1109/ICSDA.2011.6086003","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6086003","url":null,"abstract":"Based on data retrieved from Unified Minority Speech Parameter Database Platform, this paper examines main acoustic parameters of Uigur vowels of two speakers (MA2 & FE2). We have found following facts regarding Uigur vowels: vowel acoustic space is closely related with duration, which is affected by syllabic position, number and type. For vowel in word-initial CV and V syllable, vowel duration is decreased with increasing of number of syllable. But, the difference of vowel duration between monosyllabic and disyllabic words is huge, reaching 2–4 times in most cases. In addition, vowel /ε/, /o/ have longer durations and /i/, /u/ (high vowels) have shorter durations. Vowel /u/ has wider variance of duration.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115865278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised spoken term detection with acoustic segment model 基于声学段模型的无监督口语词检测
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085989
Haipeng Wang, Tan Lee, C. Leung
{"title":"Unsupervised spoken term detection with acoustic segment model","authors":"Haipeng Wang, Tan Lee, C. Leung","doi":"10.1109/ICSDA.2011.6085989","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085989","url":null,"abstract":"This paper describes a study on query-by-example spoken term detection (STD) using the acoustic segment modeling technique. Acoustic segment models (ASMs) are a set of hidden Markov models (HMM) that are obtained in an unsupervised manner without using any transcription information. The training of ASMs follows an iterative procedure, which consists of the steps of initial segmentation, segments labeling, and HMM parameter estimation. The ASMs are incorporated into a template-matching framework for query-by-example STD. Both the spoken query examples and the test utterances are represented by frame-level ASM posteriorgrams. Segmental dynamic time warping (DTW) is applied to match the query with the test utterance and locate the possible occurrences. The performance of the proposed approach is evaluated with different DTW local distance measures on the TIMIT and the Fisher Corpora respectively. Experimental results show that the use of ASM posteriorgrams leads to consistently better performance of detection than the conventional GMM posteriorgrams.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128461605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Interactive visualization and search system for speech corpora 语音语料库的交互式可视化和搜索系统
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085999
S. Itahashi, T. Kajiyama, K. Yamakawa, Y. Ishimoto, T. Matsui
{"title":"Interactive visualization and search system for speech corpora","authors":"S. Itahashi, T. Kajiyama, K. Yamakawa, Y. Ishimoto, T. Matsui","doi":"10.1109/ICSDA.2011.6085999","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085999","url":null,"abstract":"We have already reported a corpus similarity visualization method based on the corpus attribute using multidimensional scaling that makes it easy for users to utilize various speech corpora. In this paper, we present a revised visualization method that is based on a ring structure like a planisphere. By using only a mouse, a user can choose appropriate search keys for each of the multiple attributes and can easily filter information by adjusting the keys. Retrieved results are displayed inside the rings, and the user can filter and browse them in real time. This will facilitate efficient searching of the specific corpus that fits user's needs.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"34 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Construction of speech corpus of AESOP-SD AESOP-SD语言语料库的构建
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085977
Yuan Jia, Meng Wang, Honghua Zhai, Ai-jun Li
{"title":"Construction of speech corpus of AESOP-SD","authors":"Yuan Jia, Meng Wang, Honghua Zhai, Ai-jun Li","doi":"10.1109/ICSDA.2011.6085977","DOIUrl":"https://doi.org/10.1109/ICSDA.2011.6085977","url":null,"abstract":"The present study systematically states the construction of the corpus on the English learners, i.e., AESOP-SD. The content mainly consists of three parts: i) the background and significance of the corpus construction, which introduces the research background of the English learning in Asia and states the significance of the construction of the present corpus; ii) materials in the corpus, which contain a large amount of data, ranging from English words, English sentences, and English paragraphs to dialectal words, sentences and paragraphs; iii) recording procedure and data labeling, which states the recording environment and software in the data collection. Through the introduction of the construction of the corpus, the paper further states the theoretical and applicable value of the corpus, which can be adopted to conduct research in many research areas: e.g., second language acquisition, phonetic and phonological study, computer-aid system.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122221347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信