2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)最新文献

Dialogue scenario classification based on social factors 基于社会因素的对话场景分类

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037880

Yuning Liu, Di Zhou, M. Unoki, J. Dang, Ai-jun Li

{"title":"Dialogue scenario classification based on social factors","authors":"Yuning Liu, Di Zhou, M. Unoki, J. Dang, Ai-jun Li","doi":"10.1109/ISCSLP57327.2022.10037880","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037880","url":null,"abstract":"The tendency of interlocutors to become more similar to each other in the way they speak, this behavior is known in the literature as entrainment, accommodation, or adaptation. Previous studies indicated that entrainment can be treated as a social factor in human-human conversations. However, previous research suggests that this phenomenon has many subtleties. One of these cues is that entrainment on an acoustic feature might be associated with disentrainment on another in conversation, which means we have to consider these features together. Therefore, we proposed a linear dimensionality-reduction method that combines acoustic features to calculate three entrainment metrics: proximity, convergence, and synchrony. The three entrainment metrics are referred to as social factors hereafter. Our results show these social factors play an important role in a classification task. We also found that these social factors perform a better classification accuracy than combining each individual acoustic feature’s entrainment. The proposed social factors can help the human-machine interface to have the ability to adapt to the different scenarios in dialogue.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127102890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition 利用同音字扩展和统一书写提高低资源粤语语音识别中的生僻词识别

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038220

Ho-Lam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, H. Meng

{"title":"Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition","authors":"Ho-Lam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, H. Meng","doi":"10.1109/ISCSLP57327.2022.10038220","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038220","url":null,"abstract":"Homophone characters are common in tonal syllable-based languages, such as Mandarin and Cantonese. The data-intensive end-to-end Automatic Speech Recognition (ASR) systems are more likely to mis-recognize homophone characters and rare words under low-resource settings. For the problem of low-resource Cantonese speech recognition, this paper presents a novel homophone extension method to integrate human knowledge of the homophone lexicon into the beam search decoding process with language model re-scoring. Besides, we propose an automatic unified writing method to merge the variants of Cantonese characters and standardize speech annotation guidelines, which enables more efficient utilization of labeled utterances by providing more samples for the merged characters. We empirically show that both homophone extension and unified writing improve the recognition performance significantly on both in-domain and out-of-domain test sets, with an absolute Character Error Rate (CER) decrease of around 5% and 18%.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127375714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Aphasia Detection for Cantonese-Speaking and Mandarin-Speaking Patients Using Pre-Trained Language Models 使用预训练语言模型检测粤语和普通话患者的失语症

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037929

Ying Qin, Tan Lee, A. Kong, Feng Lin

{"title":"Aphasia Detection for Cantonese-Speaking and Mandarin-Speaking Patients Using Pre-Trained Language Models","authors":"Ying Qin, Tan Lee, A. Kong, Feng Lin","doi":"10.1109/ISCSLP57327.2022.10037929","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037929","url":null,"abstract":"Automatic analysis of aphasic speech based on speech technology has been extensively investigated in recent years, but there has been a few studies on Chinese languages. In this paper, we focus on automatic aphasia detection for Cantonese-and Mandarin-speaking patients using state-of-the-art pre-trained language models that support both traditional and simplified Chinese. Given speech transcriptions of subjects, pre-trained language models are used in two ways: 1) pre-trained language model derived embeddings followed by a classifier; 2) pre-trained language model fine-tuned for aphasia detection task. Both approaches are demonstrated to outperform baseline models using acoustic features and static word embeddings. The best accuracy is obtained with fine-tuned BERT models, achieving 0.98 and 0.94 for Cantonese-speaking and Mandarin-speaking subjects respectively. We also investigate the feasibility of applying the cross-lingual pre-trained language model fine-tuned by aphasia detection task for Cantonese-speaking subjects to Mandarin-speaking subjects with limited data. The promising results will hopefully make it possible to perform detection on those low-resource pathological speech which is difficult to implement a specific detection system.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123804108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Lightweight End-To-End Deep Learning Model For Music Source Separation 用于音乐源分离的轻量级端到端深度学习模型

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037912

Yao-Ting Wang, Yi-Xing Lin, Kai-Wen Liang, Tzu-Chiang Tai, Jia-Ching Wang

引用次数: 0

A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion 一种新的口语教学技术:多注意与重复相结合的一次性跨语言语音转换

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038137

Dengfeng Ke, Wenhan Yao, Ruixin Hu, Liangjie Huang, Qi Luo, Wentao Shu

{"title":"A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion","authors":"Dengfeng Ke, Wenhan Yao, Ruixin Hu, Liangjie Huang, Qi Luo, Wentao Shu","doi":"10.1109/ISCSLP57327.2022.10038137","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038137","url":null,"abstract":"Computer aided pronunciation training(CAPT) plays an important role in oral language teaching. The main methods of traditional computer-assisted oral teaching include mispronunciation detection and pronunciation scoring and assessment.However, these two techniques only give negative feedback information such as scores or error categories. In this case,it is difficult for learners to refine their pronunciation through these two indicators without the guidance of correct speech.To tackle this problem, we proposed a cross language voice conversion(VC) framework that can generate speech with template speech content and learners’ own timbre,which can guide the learner’s pronunciation.To improve VC effect,we apply AdaIN in the fore-end and after the Value matrix in multi-head attention once respectively,called attention-AdaIN,which can improve the style transfer and sequence generation ability.We used attention-AdaIN to construct VC framework based on VAE.Experiments conducted on the AISHELL-3 and VCTK corpus showed that this new aprroach improved the baseline VAE-VC.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116821830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prosodic Encoding of Mandarin Chinese Intonation by Uygur Speakers in Declarative and Interrogative Sentences 维族人陈述句和疑问句中普通话语调的韵律编码

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037847

Tong Li, Hui Feng, Yuan Jia

引用次数: 0

Perception and production of Mandarin vowels by teenagers–blind and sighted 盲人和视力正常青少年普通话元音的感知和产生

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037586

Moyu Chen, Jing Qi, Xiyu Wu

{"title":"Perception and production of Mandarin vowels by teenagers–blind and sighted","authors":"Moyu Chen, Jing Qi, Xiyu Wu","doi":"10.1109/ISCSLP57327.2022.10037586","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037586","url":null,"abstract":"This study aims to explore the perception and production of Mandarin vowels by the blind and sighted. Twenty blind and twenty sighted teenagers participated in the identification and pronunciation experiment of five Mandarin vowels. We calculated the variability and contrast parameters in the first and the second formant space to investigate the articulatory and perceptual abilities of subjects. The Coefficient of Variation (CV) used to quantify vowel variability reflected the precision of perception and production. The result demonstrated that blind people had more variable perceptions, but there was no difference in production. Vowel contrast was represented by the Average Vowel Space (AVS) of /a/, /i/and/u/, reflecting the distinguishability of vowels. The AVSs of the blind were significantly smaller in perception than the sighted, but larger in production. We found that the difference in perception mainly comes from the second formant. Also, the specific vowel absence observed in blind participants in the perceptual experiment suggested that the lack of vision is likely to bring about differences in vowel perception cues. Unfortunately, we did not find any correlations between perception and production in terms of precision and distinction. Improved experiments are required to explore the relationship between perception and production.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130611355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR ASR中无监督域自适应的序列分布匹配

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037857

Qingxu Li, Hanjing Zhu, Liuping Luo, Gaofeng Cheng, Pengyuan Zhang, Jiasong Sun, Yonghong Yan

{"title":"Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR","authors":"Qingxu Li, Hanjing Zhu, Liuping Luo, Gaofeng Cheng, Pengyuan Zhang, Jiasong Sun, Yonghong Yan","doi":"10.1109/ISCSLP57327.2022.10037857","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037857","url":null,"abstract":"Unsupervised domain adaptation (UDA) aims to improve the cross-domain model performance without labeled target domain data. Distribution matching is a widely used UDA approach for automatic speech recognition (ASR), which learns domain-invariant while class-discriminative representations. Most previous approaches to distribution matching simply treat all frames in a sequence as independent features and match them between domains. Although intuitive and effective, the neglect of the sequential property could be sub-optimal for ASR. In this work, we propose to explicitly capture and match the sequence-level statistics with sequence pooling, leading to a sequence distribution matching approach. We examined the effectiveness of the sequence pooling on the basis of the maximum mean discrepancy (MMD) based and domain adversarial training (DAT) based distribution matching approaches. Experimental results demonstrated that the sequence pooling methods effectively boost the performance of distribution matching, especially for the MMD-based approach. By combining sequence pooling features and original features, MMD-based and DAT-based approaches relatively reduce WER by 12.08% and 14.72% over the source domain model.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Study on Mandarin Chinese “Bu” Tone Sandhi Followed by English Words 汉语普通话“不”变调后接英语单词研究

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037923

Kaige Gao, Xiyu Wu

{"title":"A Study on Mandarin Chinese “Bu” Tone Sandhi Followed by English Words","authors":"Kaige Gao, Xiyu Wu","doi":"10.1109/ISCSLP57327.2022.10037923","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037923","url":null,"abstract":"This paper focuses on tone sandhi of the Chinese word “Bu” followed by English words. In Mandarin Chinese, the tone of the word “Bu” is tone 4 (the high-falling tone), and its tone sandhi rule is that it becomes tone 2 (the mid-rising tone) when followed by another tone 4 syllable. However, few researchers have paid attention to the tone sandhi rules of “Bu” when followed by English words. In this study, the tone sandhi rule of “Bu” when followed by English words is explored by a perceptual experiment and a pitch contour analysis. Results show that the Mandarin Chinese “Bu” tone sandhi also applies when “Bu” is followed by an English word. When followed by a high-falling monosyllabic English word, “Bu” will become tone 2. Furthermore, the pitch contour of English words embedded in Mandarin sentences is predictable according to their syllabic structure and lexical stress. This paper further proves that native Chinese speakers perceive English words inserted into Chinese speech as tonal-like language. The tonal patterns of English words summarized in this study can provide theoretical support for improving the naturalness of mixed-language speech synthesis.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132546435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reconstruction of speech spectrogram based on non-invasive EEG signal 基于无创脑电信号的语音谱重构

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038234

Di Zhou, M. Unoki, Gaoyan Zhang, J. Dang

引用次数: 0