Speech Communication最新文献

筛选
英文 中文
Multi-modal co-learning for silent speech recognition based on ultrasound tongue images 基于超声舌头图像的无声语音识别多模态协同学习
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-09-12 DOI: 10.1016/j.specom.2024.103140
Minghao Guo , Jianguo Wei , Ruiteng Zhang , Yu Zhao , Qiang Fang
{"title":"Multi-modal co-learning for silent speech recognition based on ultrasound tongue images","authors":"Minghao Guo ,&nbsp;Jianguo Wei ,&nbsp;Ruiteng Zhang ,&nbsp;Yu Zhao ,&nbsp;Qiang Fang","doi":"10.1016/j.specom.2024.103140","DOIUrl":"10.1016/j.specom.2024.103140","url":null,"abstract":"<div><p>Silent speech recognition (SSR) is an essential task in human–computer interaction, aiming to recognize speech from non-acoustic modalities. A key challenge in SSR is inherent input ambiguity due to partial speech information absence in non-acoustic signals. This ambiguity leads to homophones-words with similar inputs yet different pronunciations. Current approaches address this issue either by utilizing richer additional inputs or training extra models for cross-modal embedding compensation. In this paper, we propose an effective multi-modal co-learning framework promoting the discriminative ability of silent speech representations via multi-stage training. We first construct the backbone of SSR using ultrasound tongue imaging (UTI) as the main modality and then introduce two auxiliary modalities: lip video and audio signals. Utilizing modality dropout, the model learns shared/specific features from all available streams creating a same semantic space for better generalization of the UTI representation. Given cross-modal unbalanced optimization, we highlight the importance of hyperparameter settings and modulation strategies in enabling modality-specific co-learning for SSR. Experimental results show that the modality-agnostic models with single UTI input outperform state-of-the-art modality-specific models. Confusion analysis based on phonemes/articulatory features confirms that co-learned UTI representations contain valuable information for distinguishing homophenes. Additionally, our model can perform well on two unseen testing sets, achieving cross-modal generalization for the uni-modal SSR task.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"165 ","pages":"Article 103140"},"PeriodicalIF":2.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion CLESSR-VC:用于单次语音转换的对比学习增强型自监督表示法
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-09-10 DOI: 10.1016/j.specom.2024.103139
Yuhang Xue, Ning Chen, Yixin Luo, Hongqing Zhu, Zhiying Zhu
{"title":"CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion","authors":"Yuhang Xue,&nbsp;Ning Chen,&nbsp;Yixin Luo,&nbsp;Hongqing Zhu,&nbsp;Zhiying Zhu","doi":"10.1016/j.specom.2024.103139","DOIUrl":"10.1016/j.specom.2024.103139","url":null,"abstract":"<div><p>One-shot voice conversion (VC) has attracted more and more attention due to its broad prospects for practical application. In this task, the representation ability of speech features and the model’s generalization are the focus of attention. This paper proposes a model called CLESSR-VC, which enhances pre-trained self-supervised learning (SSL) representations through contrastive learning for one-shot VC. First, SSL features from the 23rd and 9th layers of the pre-trained WavLM are adopted to extract content embedding and SSL speaker embedding, respectively, to ensure the model’s generalization. Then, the conventional acoustic feature mel-spectrograms and contrastive learning are introduced to enhance the representation ability of speech features. Specifically, contrastive learning combined with the pitch-shift augmentation method is applied to disentangle content information from SSL features accurately. Mel-spectrograms are adopted to extract mel speaker embedding. The AM-Softmax and cross-architecture contrastive learning are applied between SSL and mel speaker embeddings to obtain the fused speaker embedding that helps improve speech quality and speaker similarity. Both objective and subjective evaluation results on the VCTK corpus confirm that the proposed VC model has outstanding performance and few trainable parameters.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"165 ","pages":"Article 103139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language CSLNSpeech:借助中文手语解决扩展语音分离问题
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-09-02 DOI: 10.1016/j.specom.2024.103131
Jiasong Wu , Xuan Li , Taotao Li , Fanman Meng , Youyong Kong , Guanyu Yang , Lotfi Senhadji , Huazhong Shu
{"title":"CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language","authors":"Jiasong Wu ,&nbsp;Xuan Li ,&nbsp;Taotao Li ,&nbsp;Fanman Meng ,&nbsp;Youyong Kong ,&nbsp;Guanyu Yang ,&nbsp;Lotfi Senhadji ,&nbsp;Huazhong Shu","doi":"10.1016/j.specom.2024.103131","DOIUrl":"10.1016/j.specom.2024.103131","url":null,"abstract":"<div><p>Previous audio-visual speech separation methods synchronize the speaker's facial movement and speech in the video to self-supervise the speech separation. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network to learn the combination of three modalities, audio, face, and sign language information, to solve the speech separation problem better. We introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset to train the model, in which three modalities coexist: audio, face, and sign language. Experimental results show that the proposed model performs better and is more robust than the usual audio-visual system. In addition, the sign language modality can also be used alone to supervise speech separation tasks, and introducing sign language helps hearing-impaired people learn and communicate. Last, our model is a general speech separation framework and can achieve very competitive separation performance on two open-source audio-visual datasets. The code is available at https://github.com/iveveive/SLNSpeech</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"165 ","pages":"Article 103131"},"PeriodicalIF":2.4,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing neural network architectures for non-intrusive speech quality prediction 比较用于非侵入式语音质量预测的神经网络架构
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-08-30 DOI: 10.1016/j.specom.2024.103123
Leif Førland Schill , Tobias Piechowiak , Clément Laroche , Pejman Mowlaee
{"title":"Comparing neural network architectures for non-intrusive speech quality prediction","authors":"Leif Førland Schill ,&nbsp;Tobias Piechowiak ,&nbsp;Clément Laroche ,&nbsp;Pejman Mowlaee","doi":"10.1016/j.specom.2024.103123","DOIUrl":"10.1016/j.specom.2024.103123","url":null,"abstract":"<div><p>Non-intrusive speech quality predictors evaluate speech quality without the use of a reference signal, making them useful in many practical applications. Recently, neural networks have shown the best performance for this task. Two such models in the literature are the convolutional neural network based DNSMOS and the bi-directional long short-term memory based Quality-Net, which were originally trained to predict subjective targets and intrusive PESQ scores, respectively. In this paper, these two architectures are trained on a single dataset, and used to predict the intrusive ViSQOL score. The evaluation is done on a number of test sets with a variety of mismatch conditions, including unseen speech and noise corpora, and common voice over IP distortions. The experiments show that the models achieve similar predictive ability on the training distribution, and overall good generalization to new noise and speech corpora. Unseen distortions are identified as an area where both models generalize poorly, especially DNSMOS. Our results also suggest that a pervasiveness of ambient noise in the training set can cause problems when generalizing to certain types of noise. Finally, we detail how the ViSQOL score can have undesirable dependencies on the reference pressure level and the voice activity level.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"165 ","pages":"Article 103123"},"PeriodicalIF":2.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000943/pdfft?md5=5812564c5b5fd37eb77c86b9c56fb655&pid=1-s2.0-S0167639324000943-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate synthesis of dysarthric Speech for ASR data augmentation 为 ASR 数据扩增准确合成听力障碍语音
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-08-10 DOI: 10.1016/j.specom.2024.103112
Mohammad Soleymanpour , Michael T. Johnson , Rahim Soleymanpour , Jeffrey Berry
{"title":"Accurate synthesis of dysarthric Speech for ASR data augmentation","authors":"Mohammad Soleymanpour ,&nbsp;Michael T. Johnson ,&nbsp;Rahim Soleymanpour ,&nbsp;Jeffrey Berry","doi":"10.1016/j.specom.2024.103112","DOIUrl":"10.1016/j.specom.2024.103112","url":null,"abstract":"<div><p>Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech Recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers.</p><p>This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels.</p><p>To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN<img>HMM model trained on additional synthetic dysarthric speech achieves relative Word Error Rate (WER) improvement of 12.2 % compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5 %, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthricness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysarthricness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthria. Audio samples are available at https://mohammadelc.github.io/SpeechGroupUKY/</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"164 ","pages":"Article 103112"},"PeriodicalIF":2.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CFAD: A Chinese dataset for fake audio detection CFAD:用于假音频检测的中文数据集
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-08-08 DOI: 10.1016/j.specom.2024.103122
Haoxin Ma , Jiangyan Yi , Chenglong Wang , Xinrui Yan , Jianhua Tao , Tao Wang , Shiming Wang , Ruibo Fu
{"title":"CFAD: A Chinese dataset for fake audio detection","authors":"Haoxin Ma ,&nbsp;Jiangyan Yi ,&nbsp;Chenglong Wang ,&nbsp;Xinrui Yan ,&nbsp;Jianhua Tao ,&nbsp;Tao Wang ,&nbsp;Shiming Wang ,&nbsp;Ruibo Fu","doi":"10.1016/j.specom.2024.103122","DOIUrl":"10.1016/j.specom.2024.103122","url":null,"abstract":"<div><p>Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake audio. To simulate the real-life scenarios, three noise datasets are selected for noise adding at five different signal-to-noise ratios, and six codecs are considered for audio transcoding (format conversion). CFAD dataset can be used not only for fake audio detection but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The CFAD dataset is publicly available.<span><span><sup>1</sup></span></span></p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"164 ","pages":"Article 103122"},"PeriodicalIF":2.4,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison and analysis of new curriculum criteria for end-to-end ASR 端到端 ASR 新课程标准的比较与分析
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103113
Georgios Karakasidis , Mikko Kurimo , Peter Bell , Tamás Grósz
{"title":"Comparison and analysis of new curriculum criteria for end-to-end ASR","authors":"Georgios Karakasidis ,&nbsp;Mikko Kurimo ,&nbsp;Peter Bell ,&nbsp;Tamás Grósz","doi":"10.1016/j.specom.2024.103113","DOIUrl":"10.1016/j.specom.2024.103113","url":null,"abstract":"<div><p>Traditionally, teaching a human and a Machine Learning (ML) model is quite different, but organized and structured learning has the ability to enable faster and better understanding of the underlying concepts. For example, when humans learn to speak, they first learn how to utter basic phones and then slowly move towards more complex structures such as words and sentences. Motivated by this observation, researchers have started to adapt this approach for training ML models. Since the main concept, the gradual increase in difficulty, resembles the notion of the curriculum in education, the methodology became known as Curriculum Learning (CL). In this work, we design and test new CL approaches to train Automatic Speech Recognition systems, specifically focusing on the so-called end-to-end models. These models consist of a single, large-scale neural network that performs the recognition task, in contrast to the traditional way of having several specialized components focusing on different subtasks (e.g., acoustic and language modeling). We demonstrate that end-to-end models can achieve better performances if they are provided with an organized training set consisting of examples that exhibit an increasing level of difficulty. To impose structure on the training set and to define the notion of an easy example, we explored multiple solutions that use either external, static scoring methods or incorporate feedback from the model itself. In addition, we examined the effect of pacing functions that control how much data is presented to the network during each training epoch. Our proposed curriculum learning strategies were tested on the task of speech recognition on two data sets, one containing spontaneous Finnish speech where volunteers were asked to speak about a given topic, and one containing planned English speech. Empirical results showed that a good curriculum strategy can yield performance improvements and speed-up convergence. After a given number of epochs, our best strategy achieved a 5.6% and 3.4% decrease in terms of test set word error rate for the Finnish and English data sets, respectively.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"163 ","pages":"Article 103113"},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000840/pdfft?md5=60eaa8c29b9e0afde3f299e6bfeb1d10&pid=1-s2.0-S0167639324000840-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tone-syllable synchrony in Mandarin: New evidence and implications 普通话中的声调-音节同步性:新的证据和影响
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103121
Weiyi Kang, Yi Xu
{"title":"Tone-syllable synchrony in Mandarin: New evidence and implications","authors":"Weiyi Kang,&nbsp;Yi Xu","doi":"10.1016/j.specom.2024.103121","DOIUrl":"10.1016/j.specom.2024.103121","url":null,"abstract":"<div><p>Recent research has shown evidence based on a minimal contrast paradigm that consonants and vowels are articulatorily synchronized at the onset of the syllable. What remains less clear is the laryngeal dimension of the syllable, for which evidence of tone synchrony with the consonant-vowel syllable has been circumstantial. The present study assesses the precise tone-vowel alignment in Mandarin Chinese by applying the minimal contrast paradigm. The vowel onset is determined by detecting divergence points of F2 trajectories between a pair of disyllabic sequences with two contrasting vowels, and the onsets of tones are determined by detecting divergence points of <em>f</em><sub>0</sub> trajectories in contrasting disyllabic tone pairs, using generalized additive mixed models (GAMMs). The alignment of the divergence-determined vowel and tone onsets is then evaluated with linear mixed effect models (LMEMs) and their synchrony is validated with Bayes factors. The results indicate that tone and vowel onsets are fully synchronized. There is therefore evidence for strict alignment of consonant, vowel and tone as hypothesized in the synchronization model of the syllable. Also, with the newly established tone onset, the previously reported ‘anticipatory raising’ effect of tone now appears to occur <em>within</em> rather than <em>before</em> the articulatory syllable. Implications of these findings will be discussed.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"163 ","pages":"Article 103121"},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016763932400092X/pdfft?md5=d240d5edd58b402ead4372ec1ec2baa9&pid=1-s2.0-S016763932400092X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Arabic Automatic Speech Recognition: Challenges and Progress 阿拉伯语自动语音识别:挑战与进步
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103110
Fatma Zahra Besdouri , Inès Zribi , Lamia Hadrich Belguith
{"title":"Arabic Automatic Speech Recognition: Challenges and Progress","authors":"Fatma Zahra Besdouri ,&nbsp;Inès Zribi ,&nbsp;Lamia Hadrich Belguith","doi":"10.1016/j.specom.2024.103110","DOIUrl":"10.1016/j.specom.2024.103110","url":null,"abstract":"<div><p>This paper provides a structured examination of Arabic Automatic Speech Recognition (ASR), focusing on the complexity posed by the language’s diverse forms and dialectal variations. We first explore the Arabic language forms, delimiting the challenges encountered with Dialectal Arabic, including issues such as code-switching and non-standardized orthography and, thus, the scarcity of large annotated datasets. Subsequently, we delve into the landscape of Arabic resources, distinguishing between Modern Standard Arabic (MSA) and Dialectal Arabic (DA) Speech Resources and highlighting the disparities in available data between these two categories. Finally, we analyze both traditional and modern approaches in Arabic ASR, assessing their effectiveness in addressing the unique challenges inherent to the language. Through this comprehensive examination, we aim to provide insights into the current state and future directions of Arabic ASR research and development.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"163 ","pages":"Article 103110"},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean 讲延边朝鲜语的人在接触首尔朝鲜语时,往往会把 /e/ 和 /ɛ/ 混为一谈。
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2024-07-30 DOI: 10.1016/j.specom.2024.103111
Xiaohua Yu , Sunghye Cho , Yong-cheol Lee
{"title":"Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean","authors":"Xiaohua Yu ,&nbsp;Sunghye Cho ,&nbsp;Yong-cheol Lee","doi":"10.1016/j.specom.2024.103111","DOIUrl":"10.1016/j.specom.2024.103111","url":null,"abstract":"<div><p>This study examined the vowel merger between the two vowels /e/ and /ɛ/ in Yanbian Korean. This sound change has already spread to Seoul Korean, particularly among speakers born after the 1970s. The aim of this study was to determine whether close exposure to Seoul Korean speakers leads to the neutralization of the distinction between the two vowels /e/ and /ɛ/. We recruited 20 Yanbian Korean speakers and asked them about their frequency of exposure to Seoul Korean. The exposure level of each participant was also recorded using a Likert scale. The results revealed that speakers with limited in-person interactions with Seoul Korean speakers exhibited distinct vowel productions within the vowel space. In contrast, those with frequent in-person interactions with Seoul Korean speakers tended to neutralize the two vowels, displaying considerably overlapping patterns in the vowel space. The relationship between the level of exposure to Seoul Korean and speakers’ vowel production was statistically confirmed by a linear regression analysis. Based on the results of this study, we speculate that the sound change in Yanbian Korean may become more widespread as Yanbian Korean speakers are increasingly exposed to Seoul Korean.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"164 ","pages":"Article 103111"},"PeriodicalIF":2.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信