基于话务员在水平面上的空间分离的定位和语音识别的频谱权值。

IF 2.1 2区 物理与天体物理 Q2 ACOUSTICS
Emily Buss, Richard Freyman
{"title":"基于话务员在水平面上的空间分离的定位和语音识别的频谱权值。","authors":"Emily Buss, Richard Freyman","doi":"10.1121/10.0037072","DOIUrl":null,"url":null,"abstract":"<p><p>Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"186-200"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spectral weights for localization and speech-in-speech recognition with spatial separation of talkers on the horizontal plane.\",\"authors\":\"Emily Buss, Richard Freyman\",\"doi\":\"10.1121/10.0037072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.</p>\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"158 1\",\"pages\":\"186-200\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0037072\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037072","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

先前的一些研究表明,声源定位可能不依赖于支持在空间中分离的说话者所产生的语言分离的相同线索。本实验通过将刺激过滤到1倍频宽的频带并将其分散到水平面上来评估这两个任务的空间线索的频谱权重。目标刺激是由24名男性和女性说话者发出的100毫秒的语音形状的噪音或单词,而面具(当存在时)是单词序列。对于安静环境下的定位,权重取决于中点和频带色散范围,但对于语音和噪声刺激,它们是相似的。对于分散在-15°和+15°之间的频带,权重在500和1000 Hz处达到峰值。引入语音掩码改变了定位权值的大小,但没有改变频率的相对权值。对于语音中的语音识别,掩蔽词序列主要产生信息掩蔽,因此参与者必须依靠空间线索来分离目标。至于定位,识别似乎主要依赖于500和1000赫兹波段的空间线索。实验数据表明,正确的单词识别依赖于部分参与者对目标位置和伪装语音的感知差异,但并非所有参与者都如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spectral weights for localization and speech-in-speech recognition with spatial separation of talkers on the horizontal plane.

Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.60
自引率
16.70%
发文量
1433
审稿时长
4.7 months
期刊介绍: Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信