2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)最新文献

筛选
英文 中文
Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning 回顾深度说话人嵌入学习中的统计池化层
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362097
Shuai Wang, Yexin Yang, Y. Qian, Kai Yu
{"title":"Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning","authors":"Shuai Wang, Yexin Yang, Y. Qian, Kai Yu","doi":"10.1109/ISCSLP49672.2021.9362097","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362097","url":null,"abstract":"The pooling function plays a vital role in the segment-level deep speaker embedding learning framework. One common method is to calculate the statistics of the temporal features, while the mean based temporal average pooling (TAP) and temporal statistics pooling (TSTP) which combine mean and standard deviation are two typical approaches. Empirically, researchers observe a big performance degradation in x-vector when removing the standard deviation. Based on this observation, in this paper, we designed a set of experiments to analyze the effectiveness of different statistics quantitatively, including the investigation and comparison on pooling functions based on standard deviation, covariance and ℓp-norm. Experiments are carried out on Vox-Celeb and SRE16, and the results show that the second-order statistics based pooling functions yield better performance than TAP, and only the simple standard deviation can achieve the best performance on all the evaluation conditions.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A Comparison Study on the Alignment of Prosodic and Semantic Units and Its Effects on F0 Shifting in L1 and L2 English Spontaneous Speech 母语和二语英语自发言语韵律和语义单位对齐及其对F0移位影响的比较研究
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362076
Yuqing Zhang, Zhu Li, Jinsong Zhang
{"title":"A Comparison Study on the Alignment of Prosodic and Semantic Units and Its Effects on F0 Shifting in L1 and L2 English Spontaneous Speech","authors":"Yuqing Zhang, Zhu Li, Jinsong Zhang","doi":"10.1109/ISCSLP49672.2021.9362076","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362076","url":null,"abstract":"Linguistic work on speech planning has established that advanced planning involves both the complexity of the subsequent utterance (i.e., utterance length) and the semantic completeness of the upcoming speech (e.g., whether the whole utterance reaches the termination of semantic units). Yet relatively little attention has been paid to the preplanning capacity of second-language (L2) learners. This study investigates whether learners are capable of pitch-related preplanning based on semantic units in discourse (DUs) in L2 spontaneous speech production. We analyzed relationships between f0 metrics and the semantic completeness of prosodic units (PUs) in English spontaneous speech by native speakers and EFL learners of Mandarin and Cantonese. The results indicate that PU-DU left alignment introduces an initial f0 up-shifting, and PU-DU right alignment produces a final f0 down-shifting in both L1 and L2 speech, suggesting speakers’ sensitivity to the initiation and termination of DUs. Critically, only in L1 speech the initial f0 height correlates with PU-DU right alignment, the final f0 is connected to the left alignment, and the f0 slope is strongly related to PU-DU left and right alignment conditions. The absence of correlation in L2 speech might reflect learners’ limited capacity of preplanning a whole DU in spontaneous speech production.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129271666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Extraction of Semantic Patterns in Dialogs using Convex Polytopic Model 基于凸多边形模型的对话框语义模式自动提取
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362051
Jingyan Zhou, Xiaoying Zhang, Xiaohan Feng, King Keung Wu, H. Meng
{"title":"Automatic Extraction of Semantic Patterns in Dialogs using Convex Polytopic Model","authors":"Jingyan Zhou, Xiaoying Zhang, Xiaohan Feng, King Keung Wu, H. Meng","doi":"10.1109/ISCSLP49672.2021.9362051","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362051","url":null,"abstract":"Natural Language Understanding (NLU) in task-oriented dialog systems usually requires annotated data for training the understanding module. Annotation of large data sets is a costly process. This paper proposes an unsupervised framework based on Convex Polytopic Model (CPM), which automatically extracts semantic patterns from a raw dialog corpus using a geometric approach to assist in generating the semantic frames. We discover that the semantic patterns extracted are easily interpretable and have a strong correlation with the intent and slots of the semantic frames and may potentially serve as the basic units for NLU. This is an initial investigation of the properties of CPM to explore its semantic interpretability. Experiments are based on the ATIS (Air Travel Information System) corpora and show that CPM can generate semantic frames with minimal supervision.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Articulatory and Acoustic Features of Mandarin /ɹ/: A Preliminary Study 普通话/ r /的发音和声学特征初探
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362070
Shu-Wen Chen, P. Mok
{"title":"Articulatory and Acoustic Features of Mandarin /ɹ/: A Preliminary Study","authors":"Shu-Wen Chen, P. Mok","doi":"10.1109/ISCSLP49672.2021.9362070","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362070","url":null,"abstract":"Rhotic sounds in the world’s languages have a wide range of variants, and are famous for their complexity in production. The current study examined the articulatory and acoustic features of Mandarin /ɹ/ using ultrasound imaging. The results showed that similar to English rhotics, Mandarin /ɹ/ could be articulated with various tongue shapes that were usually categorized as the bunched gesture (tongue tip pointing down) or retroflex gesture (tongue tip curling up). The variation between bunched and retroflex /ɹ/, however, was only found in the postvocalic and syllabic /ɹ/. Mandarin prevocalic /ɹ/ was produced with the tongue tip pointing down (bunched gesture). Acoustically, Mandarin /ɹ/ had a higher F3 than English /ɹ/ in the prevocalic and syllabic positions, and a higher F2 in the prevocalic position, indicating less rhoticity in Mandarin /ɹ/ than in English /ɹ/. Moreover, frication noise was often observed in the prevocalic /ɹ/, but not in all prevocalic tokens. Large interspeaker variation was found in using frication noise in the production of prevocalic /ɹ/.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128154646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Tone Realization in Mandarin Speech: A Large Corpus Based Study of Disyllabic Words 普通话语音中的声调实现:基于大语料库的双音节词研究
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362073
Yaru Wu, L. Lamel, M. Adda-Decker
{"title":"Tone Realization in Mandarin Speech: A Large Corpus Based Study of Disyllabic Words","authors":"Yaru Wu, L. Lamel, M. Adda-Decker","doi":"10.1109/ISCSLP49672.2021.9362073","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362073","url":null,"abstract":"This study aims to increase our knowledge about tone realization in disyllabic words in continuous Mandarin speech. Automatic alignments of large speech corpora were carried out to enable the study of potential tone variants, with a special focus on variation factors such as prosodic position and right tonal context. The alignments without tone variants (V0, phonological representation) show that Tone 4 is more frequent in phrase-final position than in other prosodic positions, supporting the \"declination line\" pattern often observed in speech production. Tone 4 is also the most frequent lexical tone (>50%) in all prosodic positions. Alignments permitting tone variants (V1, phonetic realization) show an increase of Tone 1 in phrase-initial position, compared to V0. Tone realization is observed to be related not only to the prosodic position, but also to the within-word right tonal context. Unsurprisingly, the most notable change in tone realization happens for Tone 3 in the first syllable of disyllabic words when followed by another Tone 3 because of the well-known \"tone sandhi rule\" in which T3T3 disyllabic words become T2T3. Cross-word right tonal context is found to impact only Tone 3. However, the results in this study show that Tone 3 sandhi rule is more a tendency than an absolute rule.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122110328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Eye-tracking Study of Transposed-letter Effect in English Word Recognition by Mandarin Speakers 汉语使用者识别英语单词时转置字母效应的眼动研究
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362079
Huan Lei, J. Dang, Yu Chen
{"title":"An Eye-tracking Study of Transposed-letter Effect in English Word Recognition by Mandarin Speakers","authors":"Huan Lei, J. Dang, Yu Chen","doi":"10.1109/ISCSLP49672.2021.9362079","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362079","url":null,"abstract":"With the method of eye-tracking technology, this study explored the process of English visual word recognition by Mandarin speakers. Two groups of Chinese-English bilinguals with low and intermediate English proficiency participated in a forward-masked English lexical decision experiment that involved high- and low-frequency words. Besides, this study also manipulated the letter case of the two types of primes: the transposed-letter primes and the substituted-letter primes, to detect the transposed-letter effect (TLE) in English word recognition. The results showed that the two factors—English proficiency and word frequency were directly impacting the existence of the TLE, and the letter cases were not the major effect on TLE. More interesting finding is that the eye-tracking data were closely correlated with the TLE: for higher word frequency and better English proficiency, the larger the TLE sizes, the longer the gazing times on the two middle letters. Therefore, these results might indicate that both low- and intermediate-level Chinese-English bilinguals were on the way to develop a coarse-grained processing route for rapidly accessing English words.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones 相位谱恢复增强激光麦克风捕获的低质量语音
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362112
Chang Liu, Yang Ai, Zhenhua Ling
{"title":"Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones","authors":"Chang Liu, Yang Ai, Zhenhua Ling","doi":"10.1109/ISCSLP49672.2021.9362112","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362112","url":null,"abstract":"This paper proposes a phase spectrum recovery method for enhancing the low-quality speech captured by laser micro-phones, which is degraded by non-additive distortions during signal acquisition. Our preliminary study shows that common speech enhancement methods based on amplitude spectrum estimation can not achieve a satisfactory performance on this task. Therefore, this paper designs a speech enhancement model which is comprised of an amplitude spectrum estimator (ASE) and a phase spectrum estimator (PSE). The ASE adopts autoregressive LSTMs and multi-target learning framework to predict clean amplitude spectra from noisy ones. The PSE first adopts a waveform-based model to enhance noisy speech in time domain, and then extracts phase spectra from the enhanced waveforms. Subsequently, the outputs of the two estimators are combined to reconstruct the final enhanced speech waveforms. Our experimental results demonstrate that our proposed method can achieve higher PESQ score than the method using only ASE and the waveform-based speech enhancement methods, including UNet and TCNN.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustical Characteristics of the Cantonese Vowels and Tones Produced by Hearing Impaired Speakers 听障人士粤语元音和声调的声学特征
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362118
Wai-Sum Lee, Irene Ching-Yin Tsoi
{"title":"Acoustical Characteristics of the Cantonese Vowels and Tones Produced by Hearing Impaired Speakers","authors":"Wai-Sum Lee, Irene Ching-Yin Tsoi","doi":"10.1109/ISCSLP49672.2021.9362118","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362118","url":null,"abstract":"The paper investigates the acoustical characteristics of the Cantonese vowels and tones produced by three male and three female hearing impaired (HI) speakers with cochlear implants or hearing aids. The vowels and tones produced by two speakers of normal hearing (NH) are also analysed for comparison purposes. Results show all the six HI speakers differentiate the durations of (a) long, (b) medium-long and (c) short vowels in Cantonese. While the HI speakers differ from the NH speakers in absolute vowel durations, the ratios of (a):(b) and (b):(c) for the HI speakers are comparable to those for the NH speakers. The vowel loop data show the HI speakers with cochlear implants (CI) perform almost as equally well as the NH speaker and outperform the HI speakers with hearing aids (HA) in the production of the Cantonese vowels. The HA speakers produce the vowel loops of reduced sizes compared to those produced by the NH speakers. The F0 curves show the HA speakers outperform the CI speakers in the production of the Cantonese tones. It appears there is a correlation between the type of HI speakers and the production performance of vowels or tones.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132985547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances 基于多话语后验概率的二语语音自动评价
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362121
Guolei Jiang, Chunhong Liao, Kun Li, Pengfei Liu, Linying Jiang, H. Meng
{"title":"Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances","authors":"Guolei Jiang, Chunhong Liao, Kun Li, Pengfei Liu, Linying Jiang, H. Meng","doi":"10.1109/ISCSLP49672.2021.9362121","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362121","url":null,"abstract":"Evaluation of the level of accentedness is important for second language education, both in qualifying language teachers and in offering advice and feedback to the learners. Previous methods evaluated accentedness of a speaker based on a limited number of utterance(s) from the speaker in focus, which leads to biased/unstable results since sparse data cannot fully cover speaker-specific pronunciation errors. To enhance stability in evaluation, we investigate the use of speaker-level features and speaker-level neural networks trained on multiple utterances. Experimental results demonstrate that using speaker-level features and speaker-level models provide high accent classification accuracy comparable with human annotations. The proposed approach also enhances the stability of the evaluation results.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115684442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-complexity Post-processing Method for Speech Enhancement 语音增强的低复杂度后处理方法
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362096
Feng Bao, Yuepeng Li, Shidong Shang
{"title":"Low-complexity Post-processing Method for Speech Enhancement","authors":"Feng Bao, Yuepeng Li, Shidong Shang","doi":"10.1109/ISCSLP49672.2021.9362096","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362096","url":null,"abstract":"In this paper, we propose a low-complexity post-processing method for speech enhancement. This real-time postprocessing method considers two gains, obtained by conventional log-spectral Minimum Mean-Square Error (LogMMSE) and neural network-based speech enhancement algorithms, respectively. These two gains are combined by an adaptive factor to share the advantages of these two kinds of enhancement algorithms. The harmonic structure of speech signal is further recovered by applying a harmonic gain calculated by the signal spectra and adaptive factor. Experimental results show that the proposed post-processing method achieves better performances in terms of speech quality.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123371656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信