2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)最新文献_第2页

Effects of Mandarin Tones on Acoustic Cue Weighting Patterns for Prominence 普通话声调对突出音提示加权模式的影响

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362105

Wei Zhang, Meghan Clayards, Jinsong Zhang

{"title":"Effects of Mandarin Tones on Acoustic Cue Weighting Patterns for Prominence","authors":"Wei Zhang, Meghan Clayards, Jinsong Zhang","doi":"10.1109/ISCSLP49672.2021.9362105","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362105","url":null,"abstract":"As a crucial perceived trait of speech, prominence is associated with various communicative functions. The acoustic cues for prominence have been explored extensively, but how the cue weighting pattern varies in different contexts needs a closer examination. This paper investigates how Mandarin tones affect the cue weighting pattern of prominence. On the basis of an annotated speech corpus, mixed-effect logistic regression (MELR) models were fitted for prominence of Mandarin syllables in each lexical tone, by taking fundamental frequency (F0), intensity, duration and formant features as independent variables, and the presence/absence of prominence as the dependent variable. Results showed the varying cue weighting patterns in different tones: (1) The first formant F1 contributes only to level tones T1/T3, and its contribution is much smaller than prosodic features; (2) For H-onset tones F0 features contribute more than intensity, while for L-onset tones F0 features contribute less than intensity; (3) For dynamic tones T2/T4, not the maximum but the minimum of F0 has contribution; (4) Duration has the largest contribution to T4, which is intrinsically shorter than other three tones","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output 基于ASR输出的非母语英语语音词级阅读错误自动检测

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362102

Ying Qin, Yao Qian, Anastassia Loukina, P. Lange, A. Misra, Keelan Evanini, Tan Lee

引用次数: 0

A New Method for Improving Generative Adversarial Networks in Speech Enhancement 语音增强中生成对抗网络改进的新方法

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362057

Fan Yang, Junfeng Li, Yonghong Yan

{"title":"A New Method for Improving Generative Adversarial Networks in Speech Enhancement","authors":"Fan Yang, Junfeng Li, Yonghong Yan","doi":"10.1109/ISCSLP49672.2021.9362057","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362057","url":null,"abstract":"Recent advances in deep learning-based speech enhancement techniques have shown promising prospects over most traditional methods. Generative adversarial networks (GANs), as a recent breakthrough in deep learning, can effectively remove additive noise embedded in speech, improving the perceptual quality [1]. In the existing methods of using GANs to achieve speech enhancement, the discriminator often regards the clean speech signal as real data and the enhanced speech signal as fake data; however, this approach may cause feedback from the discriminator to fail to provide sufficient effective information for the generator to correct its output waveform. In this paper, we propose a new method to use GANs for speech enhancement. This method, by constructing a new learning target for the discriminator, allows the generator to obtain more valuable feed-back, generating more realistic speech signals. In addition, we introduce a new objective, which requires the generator to generate data that matches the statistics of the real data. Systematic evaluations and comparisons show that the proposed method yields better performance compared with state-of-art method-s, and achieves better generalization under challenging unseen noise and signal-to-noise ratio (SNR) environments.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121692215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prosody and Dialogue Act: A Perceptual Study on Chinese Interrogatives 韵律与对话行为:汉语疑问句的感性研究

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362061

G. Huang, Ai-jun Li, Sichen Zhang, Liang Zhang

引用次数: 0

Capsule Network based End-to-end System for Detection of Replay Attacks 基于胶囊网络的端到端重放攻击检测系统

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362111

Meidan Ouyang, Rohan Kumar Das, Jichen Yang, Haizhou Li

引用次数: 5

Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network 基于N-gram神经网络改进基于注意的端到端ASR

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362055

Junyi Ao, Tom Ko

{"title":"Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network","authors":"Junyi Ao, Tom Ko","doi":"10.1109/ISCSLP49672.2021.9362055","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362055","url":null,"abstract":"In attention-based end-to-end ASR, the intrinsic LM is modeled by an RNN and it forms the major part of the decoder. Comparing with external LMs, the intrinsic LM is considered as modest as it is only trained with the transcription associated with the speech data. Although it is a common practise to interpolate the scores of the end-to-end model and the external LM, the need of an external model hurts the novelty of end-to-end. Therefore, researchers are investigating different ways of improving the intrinsic LM of the end-to-end model. By observing the fact that N-gram LMs and RNN LMs can complement each other, we would like to investigate the effect of implementing an N-gram neural network inside the end-to-end model. In this paper, we examine two implementations of N-gram neural network in the context of attention-based end-to-end ASR. We find that both implementations improve the baseline and CBOW (Continuous Bag-of-Words) performs slightly better. We further propose a way to minimize the size of the N-gram component by utilizing the coda information of the modeling units. Experiments on LibriSpeech dataset show that our proposed method achieves obvious improvement with only a slight increase in model parameters.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132091610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding 多通道语音编码中基于gan的信道间幅度比解码

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362089

Jinru Zhu, C. Bao

{"title":"GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding","authors":"Jinru Zhu, C. Bao","doi":"10.1109/ISCSLP49672.2021.9362089","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362089","url":null,"abstract":"In this paper, a multi-channel speech coding method based on down-mixing and inter-channel amplitude ratio (ICAR) decoding based on generative adversarial network (GAN) is proposed. Firstly, spatial parameter inter-channel time difference (ICTD) is extracted. In the short-time Fourier transform (STFT) domain, the amplitude of the down-mixed mono signal is obtained by adding and averaging the amplitude of the multi-channel speech signals, the phase of the down-mixed mono signal is replaced by the phase of the reference channel, the STFT of the down-mixed mono signal is obtained. Then, the inverse STFT is used to obtain the down-mixed mono signal. The amplitude ratio between multichannel speech signals and down-mixed signal (ICAR) is extracted. The down-mixed mono signal is coded by Speex codec, and ICTD is quantized by a uniform scalar quantizer. The ICAR needn’t to be encoded. The ICAR is decoded from a well-trained GAN at the decoder based on the decoded mono signal. Finally, the decoded multi-channel speech signals are recovered by using the decoded down-mixed mono signal, decoded ICTD and the decoded ICAR. The experimental results show that the proposed multi-channel speech coding method can recover multi-channel speech signals with spatial information.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132947821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Scale Model for Mandarin Tone Recognition 普通话声调识别的多尺度模型

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362063

Linkai Peng, Wang Dai, Dengfeng Ke, Jinsong Zhang

引用次数: 3

Impact of Mismatched Spectral Amplitude Levels on Vowel Identification in Simulated Electric-acoustic Hearing 不匹配谱幅值对模拟电声听力中元音识别的影响

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362088

Changjie Pan, Fei Chen

引用次数: 0

Rapid Word Learning of Children with Cochlear Implants: Phonological Structure and Mutual Exclusivity 植入人工耳蜗儿童的快速词汇学习:语音结构和互斥性

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI: 10.1109/ISCSLP49672.2021.9362091

Yu-Chen Hung, Tzu-Hui Lin

{"title":"Rapid Word Learning of Children with Cochlear Implants: Phonological Structure and Mutual Exclusivity","authors":"Yu-Chen Hung, Tzu-Hui Lin","doi":"10.1109/ISCSLP49672.2021.9362091","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362091","url":null,"abstract":"To understand how degraded speech signals with reduced spectral-temporal information impact processes of word learning in children with CIs, the present study utilized a modified version of intermodal preferential looking paradigm to investigate their use of mutual exclusivity and its interaction with phonological structure. Sixteen Mandarin-speaking children with CIs aged from 34.3 and 48.1 months (M = 40.7) were recruited to examine their ability to fast-map two novel words with reduplicated-syllable and disyllable to its corresponding reference, respectively. Familiar objects were paired with novel ones to elicit the possible use of mutual exclusivity. Overall, no significant preferential looking towards target was found for the novel-novel object pairs. The finding indicates that, regardless of the phonological structure, it is challenging for children with CIs to fast map a new word. However, a significantly higher looking proportion was obtained for the disyllabic novel object when it is paired with a familiar one, providing evidence on the use of mutual exclusivity. Furthermore, the absent preferential effect for the novel object with reduplicated-syllable over the familiar one suggests that the phonological structure seems to modulate the effect of mutual exclusivity. Implications for phonological structure and mutual exclusivity are discussed.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130358399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0