基于话务员在水平面上的空间分离的定位和语音识别的频谱权值。

IF 2.1 2区物理与天体物理 Q2 ACOUSTICS

Journal of the Acoustical Society of America Pub Date : 2025-07-01 DOI:10.1121/10.0037072

Emily Buss, Richard Freyman

{"title":"基于话务员在水平面上的空间分离的定位和语音识别的频谱权值。","authors":"Emily Buss, Richard Freyman","doi":"10.1121/10.0037072","DOIUrl":null,"url":null,"abstract":"Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"186-200"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spectral weights for localization and speech-in-speech recognition with spatial separation of talkers on the horizontal plane.\",\"authors\":\"Emily Buss, Richard Freyman\",\"doi\":\"10.1121/10.0037072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"158 1\",\"pages\":\"186-200\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0037072\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037072","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

先前的一些研究表明，声源定位可能不依赖于支持在空间中分离的说话者所产生的语言分离的相同线索。本实验通过将刺激过滤到1倍频宽的频带并将其分散到水平面上来评估这两个任务的空间线索的频谱权重。目标刺激是由24名男性和女性说话者发出的100毫秒的语音形状的噪音或单词，而面具（当存在时）是单词序列。对于安静环境下的定位，权重取决于中点和频带色散范围，但对于语音和噪声刺激，它们是相似的。对于分散在-15°和+15°之间的频带，权重在500和1000 Hz处达到峰值。引入语音掩码改变了定位权值的大小，但没有改变频率的相对权值。对于语音中的语音识别，掩蔽词序列主要产生信息掩蔽，因此参与者必须依靠空间线索来分离目标。至于定位，识别似乎主要依赖于500和1000赫兹波段的空间线索。实验数据表明，正确的单词识别依赖于部分参与者对目标位置和伪装语音的感知差异，但并非所有参与者都如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spectral weights for localization and speech-in-speech recognition with spatial separation of talkers on the horizontal plane.

Some previous research has suggested that sound source localization may not rely on the same cues that support the segregation of speech produced by talkers separated in space. The present experiments evaluated spectral weights for the spatial cues underlying these two tasks by filtering stimuli into 1-octave-wide bands and dispersing them on the horizontal plane. Target stimuli were 100-ms bursts of speech-shaped noise or words produced by 24 male and female talkers, and maskers (when present) were sequences of words. For localization in quiet, weights differed depending on the midpoint and band dispersion range, but they were similar for speech and noise stimuli. For bands dispersed between -15° and +15°, weights peaked at 500 and 1000 Hz. Introducing a speech masker changed the magnitude of weights for localization, but not the relative weight by frequency. For speech-in-speech recognition, sequences of masker words produced predominantly informational masking, such that participants had to rely on spatial cues to segregate the target. As for localization, recognition appeared to rely predominantly on spatial cues in the 500- and 1000-Hz bands. Trial-by-trial data suggest that correct word recognition relied on differences in perceived location of target and masker speech for some but not for all participants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Acoustical Society of America 物理-声学

CiteScore

4.60

自引率

16.70%

发文量

1433

审稿时长

4.7 months

期刊介绍： Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.