2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)最新文献

筛选
英文 中文
Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix 使用灵活语境和范畴矩阵的上下文依赖的字素-音素评价语料库
C. Hansakunbuntheung, Sumonmas Thatphithakkul
{"title":"Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix","authors":"C. Hansakunbuntheung, Sumonmas Thatphithakkul","doi":"10.1109/ICSDA.2015.7357884","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357884","url":null,"abstract":"Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme pairs with their context information for G2P assessment. The context information includes 1) Categorial Matrix for representing orthographic types and usage domains of orthographic groups (OG). Categorial Matrix is designed to investigate problem categories in the G2P. 2) regular-expression-based flexible context for representing context variation. 3) OG Classes for representing interchangeable OGs in the flexible context. The flexible context and the word classes are designed to remove redundant contexts while covering context variation with minimal sets of patterns. By using the proposed corpus, automatic context generation for G2P evaluation can be implemented.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134179703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The recognition of neutral tone across acoustic cues 通过声音线索对中性音调的识别
Shanshan Fan, Ao Chen, Ai-jun Li
{"title":"The recognition of neutral tone across acoustic cues","authors":"Shanshan Fan, Ao Chen, Ai-jun Li","doi":"10.1109/ICSDA.2015.7357865","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357865","url":null,"abstract":"In Standard Chinese, both F0 and duration are the important acoustic cues for neutral tone perception. The present study focuses on the acoustic cues that contribute to neutral tone perception by checking the interplay between acoustic cues and other factors, including lexical status, the underlying tones. Manipulations were conducted according to different acoustic cues, obtaining three conditions: duration (D), F0 (P), or both duration and F0 (DP). The results showed that 1) both duration and F0 are necessary for neutral tone perception; 2) although F0 is the most reliable cue, the F0 alone is not sufficient for neutral tone identification; 3) real and pseudo words are different, which probably represent distinctive processing mechanisms in language networks; 4) duration serves as a more reliable cue than F0 when the underlying tone is T3; 5) paradigm effect was found in P condition: F0 showed more reliability in ABX paradigm.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115093589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint 以学习者为中心的视角,无噪声、无压力地可视化世界英语的发音多样性
Yuichi Sato, Yosuke Kashiwagi, N. Minematsu, D. Saito, K. Hirose
{"title":"Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint","authors":"Yuichi Sato, Yosuke Kashiwagi, N. Minematsu, D. Saito, K. Hirose","doi":"10.1109/ICSDA.2015.7357855","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357855","url":null,"abstract":"The term of “World Englishes” describes the current and real state of English and one of their main characteristics is a large diversity of pronunciation, called accents. We have developed two techniques of individual-based clustering of the diversity [1, 2] and educationally-effective visualization of the diversity [3]. Accent clustering requires a technique to quantify the accent gap between any speaker pair and visualization requires a technique of stress-free plotting of the speakers. In the above studies, however, we developed and assessed these two techniques independently and in this paper, we assess our technique of automatic accept gap prediction when it is used for our stress-free visualization. Further, since CALL applications today are not always used in a quiet environment, we introduce a feature enhancement (denoising) technique to improve noise-robustness of accent gap prediction. Results show that our accent gap prediction shows correlation of 0.77 to IPA-based manually-defined accent gaps and that, by applying feature enhancement to noisy input utterances, our technique can predict the accent gap that could be obtained in a clean condition, when the SNR is larger than 10 [dB].","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114885120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stress annotated Urdu speech corpus to build female voice for TTS 重音注释乌尔都语语料库构建TTS女声
B. Mumtaz, Saba Urooj, S. Hussain, Wajiha Habib
{"title":"Stress annotated Urdu speech corpus to build female voice for TTS","authors":"B. Mumtaz, Saba Urooj, S. Hussain, Wajiha Habib","doi":"10.1109/ICSDA.2015.7357857","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357857","url":null,"abstract":"This research describes the stress annotation process for the two hours of Urdu speech corpus containing 18,640 words and 28,866 syllables to build a natural voice for Text-to-speech (TTS) system. For the stress annotation of speech corpus, two algorithms i.e. phonological and acoustic stress marking algorithms have been tested in comparison to perceptual stress marking. Urdu phonological stress markings algorithm [1] reports 70% accuracy whereas Urdu acoustic stress marking algorithm developed through this research reports 81.2% accuracy. This acoustic stress marking algorithm is then used to annotate two hours of Urdu speech corpus. It is a semi-automatic acoustic stress marking algorithm, which annotates 54% data automatically using duration cue whereas 46% data is marked manually using the acoustic cues of pitch, glottalization and intensity.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133196825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Information content, weighting and distribution in continuous speech prosody - A cross-genre comparison 连续语音韵律中的信息内容、权重和分布——跨体裁比较
Helen Kai-Yun Chen, Wei-te Fang, Chiu-yu Tseng
{"title":"Information content, weighting and distribution in continuous speech prosody - A cross-genre comparison","authors":"Helen Kai-Yun Chen, Wei-te Fang, Chiu-yu Tseng","doi":"10.1109/ICSDA.2015.7357868","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357868","url":null,"abstract":"This study explores the composition of information content in continuous speech using data of a diversity of speech genres. Our approach is to measure information weighting, distribution and correlative expressiveness through perceived prosodic prominences in continuous speech from data of 4 different styles. This alternative perspective differs from reported studies on emotion related prosodic expressions and is based mainly on the assumption that patterned prominences are also positively correlated with the allocation and weighted loading of information, but only by higher level of discourse units. Four speech genres, i.e., 2 styles of read vs. 2 of spontaneous speech annotated with perceived prominences at 4 relative degrees are compared. Information allocation and weighting are calculated using both frequency count of prominence patterns and designation of weighting scores by prominence levels. The most revealing results are found in data of spontaneous conversation, which feature in more varieties of emphasis patterns as results of constant reduction. Far more significantly, conversation data also showcase that while their paragraph-level prosodic units carry the least amount of information content, the discourse-level prosodic units exhibit the highest score of information weighting. In other words, one major but less known distinctive feature of conversation speech is its largest amount of information content, which only surfaces when examined by the highest level of discourse-prosodic unit. We believe the results have furthered our understanding of prosody expressions in continuous speech in general and spontaneous conversation in particular; and could readily be utilized in many speech technology related implementations.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121565597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Elicit spoken-style data from social media through a style classifier 通过风格分类器从社交媒体中获取口语风格数据
A. Chotimongkol, Vataya Chunwijitra, Sumonmas Thatphithakkul, Nattapong Kurpukdee, C. Wutiwiwatchai
{"title":"Elicit spoken-style data from social media through a style classifier","authors":"A. Chotimongkol, Vataya Chunwijitra, Sumonmas Thatphithakkul, Nattapong Kurpukdee, C. Wutiwiwatchai","doi":"10.1109/ICSDA.2015.7357856","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357856","url":null,"abstract":"We explore the use of social media data to reduce the effort in developing a conversational speech corpus. The LOTUS-SOC corpus is created by recording Twitter messages through a mobile application. In the first phase, which took around one month, 172 hours of speech from 208 speakers were recorded and ready for use without the need for speech segmentation and transcription. In terms of language similarity to spoken language, the perplexity of LOTUS-SOC with respect to known spoken utterances is lower than that of the broadcast news corpus and almost as low as the telephone conversation corpus. We also applied a style classifier trained from words and parts-of-speech using two machine learning approaches, SVM and CRF, to identify spoken-style utterances in LOTUS-SOC. By training a language model from only the utterances classified as “spoken”, the perplexity of LOTUS-SOC was further reduced as evaluated by three different sets of spoken utterances.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparison study on contextual modeling for estimating functional loads of phonological contrasts 语音对比功能负荷估算的语境建模比较研究
Bin Wu, Yanlu Xie, Jinsong Zhang
{"title":"A comparison study on contextual modeling for estimating functional loads of phonological contrasts","authors":"Bin Wu, Yanlu Xie, Jinsong Zhang","doi":"10.1109/ICSDA.2015.7357886","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357886","url":null,"abstract":"Functional load (FL) is the quantitative measure of the importance of phonological contrasts, which stand for the differentiation of communicative linguistic units. Correct estimate of FLs is useful for the studies of speech recognition, language evolution, language teaching and etc. Conventional approaches use phonological transcriptions and unigram probabilities for the estimation, hence weak in contextual modeling. Based on the measurement of mutual information (MI) between the text and its phonological transcription, we previously proposed a novel FL measurement which utilizes n-gram word probabilities, hence owing better context modeling power. In this study, we compare the effects of different context on the estimation of FL: syllable, word, n-gram word model, and open data. Experimental results show: the wider the context modeling, the smaller the FL; FL based on MI with the trigram model achieves the best performance in modeling the context in our experiments. Compared with FL based on entropy, FL based on MI showed smaller value and is applicable to open data.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tonal alignment in Shanghai Chinese 上海汉语的调性
Bijun Ling, Jie Liang
{"title":"Tonal alignment in Shanghai Chinese","authors":"Bijun Ling, Jie Liang","doi":"10.1109/ICSDA.2015.7357878","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357878","url":null,"abstract":"In this paper, we investigated the tonal alignment in open syllable (CV) and closed syllable (CV?) starting with nasal consonant /m/ and with rising/falling F0 contours in Shanghai Chinese. Results show that a glottal coda shortens the duration of vowel significantly and in order to keep the isochronism of syllable, the duration of nasal consonant /m/ showed a significant compensatory lengthening effect, which makes the duration of consonant longer than vowel in closed syllables. As the onset of tone (rise/fall) normally stayed around the center of the host syllable [12], the onset of tone in closed syllable (T5) located within the nasal consonant /m/, which indicated that the implementation of tone started from the onset of its host syllable rather than from the onset of the rhyme and verified that the whole syllable was the tone carrier.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Construction and analysis of social-affective interaction corpus in English and Indonesian 英语和印尼语社会情感互动语料库的构建与分析
Nurul Lubis, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura
{"title":"Construction and analysis of social-affective interaction corpus in English and Indonesian","authors":"Nurul Lubis, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura","doi":"10.1109/ICSDA.2015.7357892","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357892","url":null,"abstract":"Social-affective aspects of interaction play a vital role in making human communication a rich and dynamic experience. Observation of complex emotional phenomena requires rich sets of labeled data of natural interaction. Although there has been an increase of interest in constructing corpora containing social interactions, there is still a lack of spontaneous and emotionally rich corpora. This paper presents a corpus of social-affective interactions in English and Indonesian, constructed from various television talk shows, containing natural conversations and real emotion occurrences. We carefully annotate the corpus in terms of emotion and discourse structure to allow for the aforementioned observation. The corpus is still in its early stage of development, yielding wide-ranging possibilities for future work.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123340181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On finding word-level break-type formation rules for mandarin read speech 汉语朗读语音的词级断型构词法研究
Fu-Ja Kung, Pauline Lee, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang
{"title":"On finding word-level break-type formation rules for mandarin read speech","authors":"Fu-Ja Kung, Pauline Lee, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang","doi":"10.1109/ICSDA.2015.7357864","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357864","url":null,"abstract":"This paper presents a study on exploring word-level break-type formation rules for Mandarin read speech. A 4-layer hierarchical structure with seven break types is adopted to represent the prosody of utterance. The work is based on the break-type tags labeled on a large read-speech database by the prosody labeling and modeling algorithm (PLM) proposed previously. Occurrence frequencies of seven break types for pre- and post-boundaries of several types of function words are calculated and taken as the inferred statistical break-type formation rules. Linguistic interpretations of the most likely break types occurred at pre- and post-boundaries of each function word are discussed. Some exceptions that deviate from the most likely break types are also examined.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134328450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信