2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)最新文献

筛选
英文 中文
Phonological influences on the realization of final lowering evidence from dialogue Chinese Mandarin 语音对实现汉语普通话最终降格证据的影响
Wei Lai, Xiaoying Xu, Ya Li, Hao Che, Shanfeng Liu, J. Tao
{"title":"Phonological influences on the realization of final lowering evidence from dialogue Chinese Mandarin","authors":"Wei Lai, Xiaoying Xu, Ya Li, Hao Che, Shanfeng Liu, J. Tao","doi":"10.1109/ICSDA.2014.7051425","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051425","url":null,"abstract":"Despite the discovery of final lowering effect in widespread language, its origin and realization in different phonological environments still needs exploration. In this article, with a large dialogue corpus, three experiments are conducted to examine how phonological factors (such as prosodic units, sentence stresses and boundary pitch movement) would influence the realization of final lowering in Chinese Mandarin. The results show that: I) The bearing unit of final lowering in Chinese is the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range in a physiological way. II) The position of the sentence stress has an influence on the presence/absence of final lowering. To be specific, final lowering tends to be triggered by sentence stresses on the penultimate and last third prosodic word, and suppressed by sentences stresses prior to the last third prosodic word. III) Final lowering effect would be pushed leftward by sentence stresses and high boundary tones in final positions. This article lends support to the phonological origin of final lowering, and introduces a cross-linguistic framework of prosodic structure to analyze its specific realization under different conditions of stress positions and boundary pitch movements.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"9 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124631542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visualization of pronunciation diversity of world Englishes from a speaker's self-centered viewpoint 从说话者自我中心的角度看世界英语的发音多样性
Yuji Kawase, N. Minematsu, D. Saito, K. Hirose
{"title":"Visualization of pronunciation diversity of world Englishes from a speaker's self-centered viewpoint","authors":"Yuji Kawase, N. Minematsu, D. Saito, K. Hirose","doi":"10.1109/ICSDA.2014.7051437","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051437","url":null,"abstract":"English is the only language available for global communication and is known to have a large diversity of pronunciations due to the influence of speakers' mother tongue, called accents. Our previous studies [1], [2] made an attempt to do speaker-basis clustering of those pronunciations, where every speaker was assumed to speak with his own accent. The clustering procedure required a distance matrix only in terms of pronunciation differences among speakers and [1], [2] proposed a method to predict the pronunciation distance between any pair of the speakers. A distance matrix is often visualized on a two-dimensional plane by using the Multi-Dimensional Scaling (MDS) or drawing a dendrogram. In this study, considering learners' perceptual characteristics, a new method is proposed for visualization. When a visualization result is fed back to a learner, his main interest will be in the relations from himself to the others, not those among the others. Then, by using only a part of the distance matrix and other kinds of information such as age and gender, the proposed method can visualize multiple kinds of diversity found in acoustics of English pronunciation from a speaker's self-centered viewpoint. Unlike the conventional methods, our proposal is guaranteed to cause no distortion at all in results of visualization.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121190422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Obstruent classification using modulation spectrogram based features 基于调制谱图特征的障碍物分类
Anshu Chittora, Kewal D. Malde, H. Patil
{"title":"Obstruent classification using modulation spectrogram based features","authors":"Anshu Chittora, Kewal D. Malde, H. Patil","doi":"10.1109/ICSDA.2014.7051438","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051438","url":null,"abstract":"In this paper, a new feature extraction technique based on modulation spectrogram is proposed. Modulation spectrogram gives a 2-dimensional (2-D) feature set for each obstruent segment. Since the size of feature vector given by modulation spectrogram is of very large dimension, Higher Order Singular Value Decomposition (HOSVD) theorem is used to reduce the size of feature vector. The reduced feature vector is then applied to a classifier, which classify the obstruent in three broad classes, viz., stop, affricate and fricative. Four-fold cross-validation experiments have been conducted on TIMIT database to find accuracy of obstruent classification at phoneme-level and recognition of manner of articulation of obstruents. Our experimental results show 92.22 % and 94.85 % accuracies for obstruent classification at phoneme-level and recognition of manner of articulation of obstruents, respectively, using 3-nearest neighbor classifier while with same experimental setup Mel Frequency Cepstral Coefficients (MFCC) shows 87.24 % and 93.68 % average classification accuracy of phoneme-level classification and manner of articulation level classification of obstruents, respectively.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116483015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The effect of expression clarity and presentation modality on non-native vocal emotion perception 表达清晰度和呈现方式对非母语语音情绪感知的影响
C. S. Chong, Jeesun Kim, C. Davis
{"title":"The effect of expression clarity and presentation modality on non-native vocal emotion perception","authors":"C. S. Chong, Jeesun Kim, C. Davis","doi":"10.1109/ICSDA.2014.7051430","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051430","url":null,"abstract":"The current study investigated how the presentation of visual information and the clarity of expressions would influence this non-native effect. Australian English and Cantonese native listeners were presented spoken Australian English sentences produced by actors who had very clear or ambiguous emotional expressions (levels of clarity were established in another study). Angry, happy, sad, surprise or disgust expressions were tested in auditory only (AO), visual only (VO) and audio-visual (AV) conditions. The results showed the expected non-native disadvantage for AO presentation; with the Cantonese speaker's performance significantly less accurate than the English ones. There was also the expected difference as a function of the clarity of the emotion expression; this effect was the same magnitude across the language groups. This was not the case in the VO or AV conditions where performance levels did not differ. This indicates that visual cues helped the Cantonese listeners compensate for poorer AO recognition.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Utilizing social media data through similarity-based text normalization for LVCSR language modeling 利用社交媒体数据,通过基于相似度的文本规范化进行LVCSR语言建模
A. Chotimongkol, K. Thangthai, C. Wutiwiwatchai
{"title":"Utilizing social media data through similarity-based text normalization for LVCSR language modeling","authors":"A. Chotimongkol, K. Thangthai, C. Wutiwiwatchai","doi":"10.1109/ICSDA.2014.7051432","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051432","url":null,"abstract":"In this paper, we explore the use of social media data in augmenting the lack of large prepared text corpora for LVCSR language modeling. Extensive normalization is required to handle informal and noisy nature of social media text. We propose a similarity-based text normalization approach where similarity in terms of spelling, pronunciation and context are considered. Similarity between a source (nonstandard) word and a target (normalized) word is measured by edit distance and Kullback-Leibler distance. The proposed normalization method can handle the case of homophonic, spelling error and insertion (repeated characters) which occur quite often in Twitter's texts. We then trained n-gram language models with the normalized texts and achieved up to 60% relative improvement in terms of perplexity and 9% in terms of WER on a mobile speech-to-speech translation task. The proposed approach is applicable to other types of social media texts by its unsupervised manner.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129319674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages 古吉拉特语和马拉地语声道长度标准化语音引擎的开发
Shubham Sharma, Maulik C. Madhavi, H. Patil
{"title":"Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages","authors":"Shubham Sharma, Maulik C. Madhavi, H. Patil","doi":"10.1109/ICSDA.2014.7051439","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051439","url":null,"abstract":"Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stress distribution based on dependency parsing of Chinese discourse 基于汉语语篇依存句法分析的重音分布
Yaru Wang, Ai-jun Li, Yuan Jia
{"title":"Stress distribution based on dependency parsing of Chinese discourse","authors":"Yaru Wang, Ai-jun Li, Yuan Jia","doi":"10.1109/ICSDA.2014.7051416","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051416","url":null,"abstract":"Dependency parsing is known as a syntactic or a shallow semantic analysis in NLP (Natural Language Processing). This paper conducts an interface study between syntax (dependency) and prosody (stress). The stress distribution patterns are statistically analyzed across 24 dependency relations based on HIT (Harbin Institute of Technology) dependency scheme [1] for a Chinese spoken discourse corpus. It shows that the intonation stress is more likely to appear at dependency relations of SBV, ATT, ADV and VOB. Besides, the stress distribution pattern within each relation is analyzed. The rule that the modifier is more likely to be stressed has been proved to some extent. The results demonstrate that there is an intrinsic association between stress distribution and dependency relation.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling predictive perceptual representation of Thai initial consonants 泰语声母的预测知觉表征建模
P. Phienphanich, C. Onsuwan, C. Tantibundhit, N. Saimai, T. Saimai
{"title":"Modeling predictive perceptual representation of Thai initial consonants","authors":"P. Phienphanich, C. Onsuwan, C. Tantibundhit, N. Saimai, T. Saimai","doi":"10.1109/ICSDA.2014.7051414","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051414","url":null,"abstract":"This work is an extension of our previous attempt to construct a spatial representation of 21 initial consonants in Thai by partitioning them into homogeneous clusters based on empirical measures of confusability and distance among phonemes. The measures were taken from perceptual identification performance of 28 listeners (seven full subjects) when stimuli were presented in noise. In present study, two methods of clustering, namely Multidimensional scaling analysis and k-means clustering were employed, yielding six different classifications and four perceptually relevant categories: intra-cluster short distance, intra-cluster long distance, inter-cluster short distance, and inter-cluster long distance. Another set of perceptual experiment (eight listeners; two full subjects) was carried out to verify the predictions. The findings reveal that the derived perceptual clusters and defined categories fit relatively well with the listeners' performance. Distinctive feature systems in phonological theory appear to provide some basis for the clustering of phonemes.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conversation dialog corpora from television and movie scripts 来自电视和电影剧本的对话语料库
Lasguido Nio, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura
{"title":"Conversation dialog corpora from television and movie scripts","authors":"Lasguido Nio, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura","doi":"10.1109/ICSDA.2014.7051436","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051436","url":null,"abstract":"Example-based dialogue systems often require natural conversation templates as examples for response generation. However, in previous work most conversation corpora have been created by hand and do not well portray actual conversations between two people. One way to overcome this problem is to record and transcribe real human-to-human conversation. However, this work is tedious and time consuming. In this work, we utilize conversation scripts from television and movies. We extract conversations from television and movie scripts from the web and perform various types of filtering. In order to ensure that the conversation is performed by two speakers, we introduce a unit of conversation called a tri-turn (a trigram conversation turn) which allow us to filter conversations with more than two speakers. In the end, our conversation corpora contains 86,719 query-response pairs that represent conversation turns performed by two speakers talking to each other.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132435501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Syntactic annotation under dependency scheme on Chinese spontaneous speech 汉语自发语音依存图式下的句法标注
Xuefei Liu, Ai-jun Li, Yuan Jia, Yiqing Zu
{"title":"Syntactic annotation under dependency scheme on Chinese spontaneous speech","authors":"Xuefei Liu, Ai-jun Li, Yuan Jia, Yiqing Zu","doi":"10.1109/ICSDA.2014.7051415","DOIUrl":"https://doi.org/10.1109/ICSDA.2014.7051415","url":null,"abstract":"Syntactic or semantic annotation is an indispensable work for understanding the intention of the spoken discourses or dialogues. As we know that dependency relation annotation is a kind of syntactic or low shallow semantic annotation, however the present annotation schemes are almost all for text rather than spoken discourses or dialogues. By analysis the online spontaneous chatting data, this paper tries to propose a dependency scheme for spoken discourses. Based on the HIT scheme(Harbin Institute of Technology), we finally proposed 26 kinds of dependencies, where four dependencies are added as “Translocation”, “Repetition”, “Duplication” and “Omission”, and three are modified as “Independent Structure”, “Independent Clause” and “Dependent Clause”. The refined dependency scheme enriches the annotation to Chinese spontaneous speech, which would benefit for speech recognition, semantic comprehension and machine translation.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125465001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信