2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)最新文献

筛选
英文 中文
CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research CUEMPATHY:心理治疗研究的咨询言语数据集
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038072
Dehua Tao, Harold Chui, Sarah Luk, Tan Lee
{"title":"CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research","authors":"Dehua Tao, Harold Chui, Sarah Luk, Tan Lee","doi":"10.1109/ISCSLP57327.2022.10038072","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038072","url":null,"abstract":"Psychotherapy or counseling is typically conducted through spoken conversation between a therapist and a client. Analyzing the speech characteristics of psychotherapeutic interactions can help understand the factors associated with effective psychotherapy. This paper introduces CUEMPATHY, a large-scale speech dataset collected from actual counseling sessions. The dataset consists of 156 counseling sessions involving 39 therapist-client dyads. The process of speech data collection, subjective ratings (one observer and two client ratings), and transcription are described. An automatic speech and text processing system is developed to locate the time stamps of speaker turns in each session. Examining the relationships among the three subjective ratings suggests that observer and client ratings have no significant correlation, while the client-rated measures are significantly correlated. The intensity similarity between the therapist and the client, measured by the averaged absolute difference of speaker-turn-level intensities, is associated with the psychotherapy outcomes. Recent studies on the acoustic and linguistic characteristics of the CUEMPATHY are introduced.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115368520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Multi-task Cascaded Acoustic Echo Cancellation and Noise Suppression 深度多任务级联声学回波消除与噪声抑制
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037852
Junjie Li, Meng Ge, Longbiao Wang, J. Dang
{"title":"Deep Multi-task Cascaded Acoustic Echo Cancellation and Noise Suppression","authors":"Junjie Li, Meng Ge, Longbiao Wang, J. Dang","doi":"10.1109/ISCSLP57327.2022.10037852","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037852","url":null,"abstract":"With the growing need for online communication, removing the acoustic echo and background noise during voice and video calls has become a major problem. Recent studies show that deep learning based algorithms can successfully be applied to acoustic echo cancellation. These algorithms usually use one mask to remove acoustic echo and noise at the same time. Considering the patterns of acoustic echo and noise are different, hence we propose a multi-task cascaded framework with multiple masks named DMC-AEC to ease the difficulty of removing different interference, i.e., acoustic echo and noise. The DMC-AEC consists of three cascaded blocks, each block containing one mask. The first block takes mic and far-end signals to learn the auxiliary task of estimating the echo. The second block utilizes the estimated echo, far-end and mic signals to cancel acoustic echo. And in the third block, it takes the output of the second block and further suppresses noise. The DMC-AEC is trained on a synthetic dataset of ICASSP AEC Challenge.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123932902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis 风格-无标签:语音合成中通过量化VAE和说话人明智归一化的跨说话人风格转移
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038135
Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang
{"title":"Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis","authors":"Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang","doi":"10.1109/ISCSLP57327.2022.10038135","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038135","url":null,"abstract":"Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker’s timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style transfer from source speaker to target speaker without style labels. Firstly, a reference encoder structure based on quantized variational autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style representations. Secondly, a speaker-wise batch normalization layer is proposed to reduce the source speaker leakage. In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed. Experimental results show that the method outperforms the baseline. We provide a website with audio samples1.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129356383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Effects of Aspiration on Tone Production and Perception in Standard Chinese 送音对标准汉语声调产生和感知的影响
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038091
Chong Cao, Ai-jun Li
{"title":"Effects of Aspiration on Tone Production and Perception in Standard Chinese","authors":"Chong Cao, Ai-jun Li","doi":"10.1109/ISCSLP57327.2022.10038091","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038091","url":null,"abstract":"Numerous studies reported that the onset fundamental frequency (i.e., onset f0) was usually affected by the voicing characteristics of preceding consonants in speech production. For instance, onset f0 following voiceless stops was usually higher than that following voiced stops. With regards to Standard Chinese, syllable-initial stop consonants could be classified into two groups according to the aspiration contrast, voiceless aspirated and voiceless unaspirated. The aspiration contrast is distinctive and plays an important role in distinguishing lexical meanings. Using acoustic analysis of f0 realization and categorical perception paradigm, the study aims to investigate the effect on the production and perception of lexical tones from consonants’ aspiration in Standard Chinese. Production results showed that the onset f0 following aspirated consonants was higher than that following unaspirated syllables. Moreover, the magnitude varied with lexical tones, tone 1 and tone 4 had larger differences in onset f0 than tone 2 and tone 3. Results of perception tests showed that the aspiration contrast enhanced the perceptual salience between high and low tones. Specifically, compared with unaspirated syllables, tones carried by aspirated syllables tended to be perceived as lower tones.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126857470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mix-Guided VC: Any-to-many Voice Conversion by Combining ASR and TTS Bottleneck Features 混合引导VC:结合ASR和TTS瓶颈特征的任意对多语音转换
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038075
Zeqing Zhao, Sifan Ma, Yan Jia, Jingyu Hou, Lin Yang, Junjie Wang
{"title":"Mix-Guided VC: Any-to-many Voice Conversion by Combining ASR and TTS Bottleneck Features","authors":"Zeqing Zhao, Sifan Ma, Yan Jia, Jingyu Hou, Lin Yang, Junjie Wang","doi":"10.1109/ISCSLP57327.2022.10038075","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038075","url":null,"abstract":"Due to the difficulty of obtaining parallel data, there are many works focus on non-parallel voice conversion(VC) recently. Bottleneck features(BNFs) from automatic speech recognition(ASR) and text-to-speech(TTS) models play an important role in feature disentangling for VC. In this work, we propose Mix-Guided VC, a non-parallel any-to-many voice conversion model by combining ASR-BNFs and TTS-BNFs. We demonstrate that ASR-BNFs and TTS-BNFs are complementary. ASR-BNFs are more robust especially in any-to-many tasks, but suffer from leaking source speaker’s timbre information; TTS-BNFs are closely correlated with text, but lack robustness. Experiments show that the proposed model achieves the best balance in speech quality, timbre similarity and robustness compares with baseline models. Furthermore, the whole modules in the proposed model can be trained jointly and no more pre-training data is needed.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126400658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022 用于会话短短语演讲者Diarization挑战2022的X-Lance演讲者Diarization系统
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037955
Tao Liu, Xu Xiang, Zhengyang Chen, Bing Han, Kai Yu, Y. Qian
{"title":"The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022","authors":"Tao Liu, Xu Xiang, Zhengyang Chen, Bing Han, Kai Yu, Y. Qian","doi":"10.1109/ISCSLP57327.2022.10037955","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037955","url":null,"abstract":"This paper describes X-Lance Speaker Diarization System submitted to the Conversational Short-phrase Speaker Diarization Challenge. The system outputs the ensemble results of the four modules: self-attentive-based VAD, uniform segmentation, ECAPA-TDNN-based embedding extractor, and spectral clustering. We evaluated our system on the Conversational Short-phrase Speaker Diarization (CSSD) dataset, which is based on MagicData-RAMC and contains plenty of conversational short-phrase segments. Besides being different from other diarization challenges, the challenge proposes a metric called Conversational Diarization Error Rate (CDER), which focuses on evaluating short segments. In this paper, we will analyze this metric and conduct extensive experiments. Finally, our system achieves CDER of 13.2% and 8.0% in the CSSD_dev and unseen CSSD_eval set, respectively.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133950271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAT: RNN-Attention Transformer for Speech Enhancement 语音增强的rnn -注意转换器
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037952
Tailong Zhang, Shulin He, Hao Li, Xueliang Zhang
{"title":"RAT: RNN-Attention Transformer for Speech Enhancement","authors":"Tailong Zhang, Shulin He, Hao Li, Xueliang Zhang","doi":"10.1109/ISCSLP57327.2022.10037952","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037952","url":null,"abstract":"Benefiting from the global modeling capabilities of self-attention mechanisms, Transformer-based models have seen increasing use in natural language processing tasks and automatic speech recognition. The ultra-long sight of Transformer overcomes catastrophic forgetting in Recurrent Neural Networks (RNNs). However, unlike natural language processing and speech recognition tasks that focus on global information, speech enhancement focuses more on local information. Therefore, the original Transformer is not optimally suited to speech enhancement. In this paper, we propose an improved Transformer model called RNN-Attention Transformer (RAT), which applies multi-head self-attention (MHSA) to the temporal dimension. The input sequence is chunked and different models are applied intra-chunk and inter-chunks. Since RNNs are better at modeling local information than self-attention, RNNs and self-attention are used to model intra-chunk information and inter-chunks information, respectively. Experiments show that RAT significantly reduces parameters and improves performance compared to the baseline.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement 嵌入感知视听语音增强的多任务联合学习
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10038268
Chenxi Wang, Hang Chen, Jun Du, Baocai Yin, Jia Pan
{"title":"Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement","authors":"Chenxi Wang, Hang Chen, Jun Du, Baocai Yin, Jia Pan","doi":"10.1109/ISCSLP57327.2022.10038268","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10038268","url":null,"abstract":"In this paper, we propose a multi-task joint learning scheme to improve embedding aware audio-visual speech enhancement by adopting the phone and the articulation place together as the classification targets during the training of embedding extractor and enhancement network. Firstly, the multimodal embedding is extracted from noisy speech and lip frames, and supervised by the articulation place and the phone label levels together. Next, we train the embedding extractor and enhancement network jointly where the learning objects include the ideal ratio mask, the phone posteriori and the place posteriori. Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that the proposed multimodal embedding at the multi-scale class level is more effective than the previous embedding at the place/phone level and the multi-task based joint learning framework further improves speech quality and intelligibility.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115500649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medical Difficult Airway Detection using Speech Technology 基于语音技术的医疗困难气道检测
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037911
Zhi-Kai Zhou, Shuang Cao, Zhengyang Chen, Bei Liu, Ming Xia, Hong Jiang, Y. Qian
{"title":"Medical Difficult Airway Detection using Speech Technology","authors":"Zhi-Kai Zhou, Shuang Cao, Zhengyang Chen, Bei Liu, Ming Xia, Hong Jiang, Y. Qian","doi":"10.1109/ISCSLP57327.2022.10037911","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037911","url":null,"abstract":"The detection of the difficult airway is an important process in patients undergoing surgery with general anesthesia. The inappropriate management of the difficult airway is associated with morbidity and mortality. However, rational clinical evaluation of the difficult airway have several limitations. In this paper, we explore how to use speech technology to recognize the difficult airway, and we further apply the deep speaker recognition model to the prediction of the difficult airway. Experiments are carried out on a well-designed dataset recorded from 1189 speakers in the hospital. Then, the speaker embedding is taken as the input of the final support vector machine (SVM) to make the decision. Moreover, the performance of the proposed models outperforms traditional clinical examination methods by a large margin.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123950667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision 基于多源半监督的汉语韵律边界预测模型
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2022-12-11 DOI: 10.1109/ISCSLP57327.2022.10037813
Peiyang Shi, Zengqiang Shang, Pengyuan Zhang
{"title":"A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision","authors":"Peiyang Shi, Zengqiang Shang, Pengyuan Zhang","doi":"10.1109/ISCSLP57327.2022.10037813","DOIUrl":"https://doi.org/10.1109/ISCSLP57327.2022.10037813","url":null,"abstract":"High-quality prosodic boundary prediction plays an important role in enhancing speech naturalness and intelligibility in Mandarin text-to-speech tasks. However traditional methods usually require a large amount of token-level labels, which can hardly be applied in low-resource scenarios. In this paper, to solve this problem, we propose a multi-source semi-supervised model using an HMM to assist BERT-based prosody prediction. Our proposed model implements an alternate training mechanism combining BERT-Prosody and HMM, where BERT takes denoised labels from HMM, providing updated character embedding and weak labels for the latter to form a training cycle. Experimental results show that, compared with baseline methods, the F1 score of our model is raised by 1.01%/8.25% respectively at prosodic word/phrase level, approaching the performance of supervised models.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信