Phonetics and Speech Sciences最新文献

筛选
英文 中文
End-to-end non-autoregressive fast text-to-speech 端到端非自回归快速文本到语音
Phonetics and Speech Sciences Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.047
Wiback Kim, Hosung Nam
{"title":"End-to-end non-autoregressive fast text-to-speech","authors":"Wiback Kim, Hosung Nam","doi":"10.13064/ksss.2021.13.4.047","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.047","url":null,"abstract":"","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"28 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130214589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study of prosodic features according to the syntactic diversities between children with reading disability and nondisabled children* 阅读障碍儿童与非阅读障碍儿童句法差异的韵律特征比较研究*
Phonetics and Speech Sciences Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.055
Sung-Sun Park, Cheol-jae Seong
{"title":"A comparative study of prosodic features according to the syntactic\u0000 diversities between children with reading disability and nondisabled\u0000 children*","authors":"Sung-Sun Park, Cheol-jae Seong","doi":"10.13064/ksss.2021.13.4.055","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.055","url":null,"abstract":"","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The perception and production of Korean vowels by Egyptian learners* 埃及学习者对韩语元音的感知和产生*
Phonetics and Speech Sciences Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.023
S. Benjamin, Ho-Young Lee
{"title":"The perception and production of Korean vowels by Egyptian\u0000 learners*","authors":"S. Benjamin, Ho-Young Lee","doi":"10.13064/ksss.2021.13.4.023","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.023","url":null,"abstract":"This study aims to discuss how Egyptian learners of Korean perceive and categorize Korean vowels, how Koreans perceive Korean vowels they pronounce, and how Egyptian learners’ Korean vowel categorization affects their perception and production of Korean vowels. In Experiment 1, 53 Egyptian learners were asked to listen to Korean test words pronounced by Koreans and choose the words they had listened to among 4 confusable words. In Experiment 2, 117 sound files (13 test words×9 Egyptian learners) recorded by Egyptian learners were given to Koreans and asked to select the words they had heard among 4 confusable words. The results of the experiments show that “new” Korean vowels that do not have categorizable ones in Egyptian Arabic easily formed new categories and were therefore well identified in perception and relatively well pronounced, but some of them were poorly produced. However, Egyptian learners poorly distinguished “similar” Korean vowels in perception, but their pronunciation was relatively well identified by native Koreans. Based on the results of this study, we argued that the Speech Learning Model (SLM) and Perceptual Assimilation Model (PAM) explain the L2 speech perception well, but they are insufficient to explain L2 speech production and therefore need to be revised and extended to L2 speech production.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A longitudinal analysis on interruption in preschool children who stutter during interactions with their mothers* 学龄前口吃儿童与母亲互动中断的纵向分析*
Phonetics and Speech Sciences Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.075
Hyo-Jung Kwak, Si-Hyeon Hwang, Pu Song, H. Sim, Soo-Bok Lee
{"title":"A longitudinal analysis on interruption in preschool children who\u0000 stutter during interactions with their mothers*","authors":"Hyo-Jung Kwak, Si-Hyeon Hwang, Pu Song, H. Sim, Soo-Bok Lee","doi":"10.13064/ksss.2021.13.4.075","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.075","url":null,"abstract":"The purpose of this study was to investigate, longitudinally, interruption behavior which children who do stutter (CWS), children who do not stutter (CWNS) and their mothers and relationship with disfluency of children showed during interactions with their mothers. Subjects for this study consisted of 2−5 year old CWS (male 2 and female 4), an age-matched group of CWNS (male 3 and female 3), and their mothers. Frequencies of normal disfluency (ND) and abnormal disfluency (AD) in children group and frequency of interruption and simultalk duration in children and mothers group were measured two times (initial visit and 12 months later) over the course of one year. As a result, no significant difference was observed in frequency of interruption and simultalk duration both between two mother groups and between two child groups at initial visit and 12 months later. However, frequency of interruption increased significantly over the course of one year in CWS group. A significant group difference was found in frequency of interruption of mothers but, no significant difference was observed in simultalk duration of mothers at initial visit. In the CWS.mothers group, no factors were related with disfluency of children at initial visit and 12 months later. These findings suggest that interruption is not just negative behavior, and that reducing interruption should be considered in child.parent interaction therapy for CWS.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127997182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The f0 distribution of Korean speakers in a spontaneous speech corpus* 自发语料库中朝鲜语使用者的分布*
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.031
Byunggon Yang
{"title":"The f0 distribution of Korean speakers in a spontaneous speech\u0000 corpus*","authors":"Byunggon Yang","doi":"10.13064/ksss.2021.13.3.031","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.031","url":null,"abstract":"The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126889654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perception of Japanese word-initial stops by native listeners* 母语听众对日语单词开头停顿的感知*
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.053
Hi-Gyung Byun
{"title":"Perception of Japanese word-initial stops by native\u0000 listeners*","authors":"Hi-Gyung Byun","doi":"10.13064/ksss.2021.13.3.053","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.053","url":null,"abstract":"It is known that the voicing contrast for Japanese word-initial stops is primarily realized as differences in the voice onset time (VOT). However, recent studies have reported that voiced stops are more often produced with a positive VOT than with a negative VOT among the younger generation nationwide. It is also known that post-stop F0 is associated with the stop contrast, but the degree of F0 use differs from region to region. This study explores whether the difference in post-stop F0 functions as a perceptual cue to the stop contrast along with VOT. Fifty-five college students who are native listeners from four different regions participated in two or three perception tests. The results show that VOT is a primary cue to the voiced-voiceless distinction of word-initial stops, but that the effect of post-stop F0 on the stop contrast is marginal. The post-stop F0 is involved in perception only when VOT is ambiguous, such that a sound with high F0 is more often perceived as a voiceless stop, but not vice versa. The results of this study indicate that the acoustic parameters associated with the stop contrast are not the same in production and perception, and suggest that other factors such as context, which is not an acoustic characteristic, may also be involved in the stop contrast.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131697238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages* 不同方言和年龄朝鲜语停止感知执行功能与提示权重的关系*
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.021
Eun Jong Kong, Hyunjung Lee
{"title":"Relationship between executive function and cue weighting in Korean\u0000 stop perception across different dialects and ages*","authors":"Eun Jong Kong, Hyunjung Lee","doi":"10.13064/ksss.2021.13.3.021","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.021","url":null,"abstract":"The present study investigated how one’s cognitive resources are related to speech perception by examining Korean speakers’ executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t’/ vs. /t/ vs. /t h /). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners’ f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128661211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of speech motor practice and linguistic complexity on articulation rate in adults who stutter* 言语运动练习和语言复杂性对口吃成人发音率的影响*
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.091
HeeCheong Chon, T. Loucks
{"title":"Effects of speech motor practice and linguistic complexity on\u0000 articulation rate in adults who stutter*","authors":"HeeCheong Chon, T. Loucks","doi":"10.13064/ksss.2021.13.3.091","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.091","url":null,"abstract":"This study aimed to investigate speech motor control in adults who stutter (AWS) by testing whether articulation rate changes with practice and linguistic complexity. Eleven AWS and 11 adults who do not stutter (AWNS) repeated four sentences of different lengths and syntactic complexity [simple-short (SS), simple-long (SL), complex-long (CL), and faulty-long (FL) sentences]. Overall articulation rates of each sentence were measured and compared between groups. Practice effects were evaluated by comparing the articulation rates of the first three, middle four, and last three productions. Overall, the AWS had significantly slower articulation rates than AWNS across the four sentences. The longer sentences showed significantly slower articulation rates than the baseline sentence (SS). The articulation rates of the middle four and the last three productions were significantly faster than those of the first three productions of each sentence in both groups. The articulation rates of the SS, SL, and CL sentences indicated a consistent practice effect. The slower articulation rates of the AWS are consistent with a speech motor limitation. There was no interaction with linguistic complexity or practice, so a slower articulation rate may be a general feature of the speech of AWS. Both AWS and AWNS showed practice effects with faster articulation rates which may reflect a degree of adaptation to the stimuli.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114648461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-to-speech with linear spectrogram prediction for quality and speed improvement 文本到语音的线性谱图预测质量和速度的提高
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.071
Hyebin Yoon, Hosung Nam
{"title":"Text-to-speech with linear spectrogram prediction for quality and\u0000 speed improvement","authors":"Hyebin Yoon, Hosung Nam","doi":"10.13064/ksss.2021.13.3.071","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.071","url":null,"abstract":"Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Korean speakers hyperarticulate vowels in polite speech* 讲韩语的人在礼貌讲话中元音发音非常清晰
Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.015
Eunhae Oh, Bodo Winter, K. Idemaru
{"title":"Korean speakers hyperarticulate vowels in polite speech*","authors":"Eunhae Oh, Bodo Winter, K. Idemaru","doi":"10.13064/ksss.2021.13.3.015","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.015","url":null,"abstract":"In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"152 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131144046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信