Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes.

IF 4 2区医学 Q1 NEUROSCIENCES

Journal of Neuroscience Pub Date : 2025-10-09 DOI:10.1523/JNEUROSCI.0754-25.2025

Kiki van der Heijden, Prachi Patel, Stephan Bickel, Jose L Herrero, Ashesh D Mehta, Nima Mesgarani

{"title":"Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes.","authors":"Kiki van der Heijden, Prachi Patel, Stephan Bickel, Jose L Herrero, Ashesh D Mehta, Nima Mesgarani","doi":"10.1523/JNEUROSCI.0754-25.2025","DOIUrl":null,"url":null,"abstract":"Listeners effortlessly extract multidimensional auditory objects, such as a localized talker, from complex acoustic scenes. However, the neural mechanisms that enable simultaneous encoding and linking of distinct sound features-such as a talker's voice and location-are not fully understood. Using invasive intracranial recordings in seven neurosurgical patients (4 male, 3 female), we investigated how the human auditory cortex processes and integrates these features during naturalistic multi-talker scenes and how attentional mechanisms modulate such feature integration. We found that cortical sites exhibit a continuum of feature sensitivity, ranging from single-feature sensitive sites (responsive primarily to voice spectral features or to location features) to dual-feature sensitive sites (responsive to both features). At the population level, neural response patterns from both single- and dual-feature sensitive sites jointly encoded the attended talker's voice and location. Notably, single-feature sensitive sites encoded their primary feature with greater precision but also represented coarse information about the secondary feature. Sites selectively tracking a single, attended speech stream concurrently encoded both voice and location features, demonstrating a link between selective attention and feature integration. Additionally, attention selectively enhanced temporal coherence between voice- and location-sensitive sites, suggesting that temporal synchronization serves as a mechanism for linking these features. Our findings highlight two complementary neural mechanisms-joint population coding and temporal coherence-that enable the integration of voice and location features in the auditory cortex. These results provide new insights into the distributed, multidimensional nature of auditory object formation during active listening in complex environments.Significance statement In everyday life, listeners effortlessly extract individual sound sources from complex acoustic scenes which contain multiple sound sources. Yet, how the brain links the different features of a particular sound source to each other - such as a talker's voice characteristics and location - is poorly understood. Here, we show that two neural mechanisms contribute to encoding and integrating voice and location features in multi-talker sound scenes: (1) some neuronal sites are sensitive to both voice and location and their activity patterns encode these features jointly; (2) the responses of neuronal sites that process only one sound feature - that is, location or voice - align temporally to form a stream that is segregated from the other talker.","PeriodicalId":50114,"journal":{"name":"Journal of Neuroscience","volume":" ","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/JNEUROSCI.0754-25.2025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Listeners effortlessly extract multidimensional auditory objects, such as a localized talker, from complex acoustic scenes. However, the neural mechanisms that enable simultaneous encoding and linking of distinct sound features-such as a talker's voice and location-are not fully understood. Using invasive intracranial recordings in seven neurosurgical patients (4 male, 3 female), we investigated how the human auditory cortex processes and integrates these features during naturalistic multi-talker scenes and how attentional mechanisms modulate such feature integration. We found that cortical sites exhibit a continuum of feature sensitivity, ranging from single-feature sensitive sites (responsive primarily to voice spectral features or to location features) to dual-feature sensitive sites (responsive to both features). At the population level, neural response patterns from both single- and dual-feature sensitive sites jointly encoded the attended talker's voice and location. Notably, single-feature sensitive sites encoded their primary feature with greater precision but also represented coarse information about the secondary feature. Sites selectively tracking a single, attended speech stream concurrently encoded both voice and location features, demonstrating a link between selective attention and feature integration. Additionally, attention selectively enhanced temporal coherence between voice- and location-sensitive sites, suggesting that temporal synchronization serves as a mechanism for linking these features. Our findings highlight two complementary neural mechanisms-joint population coding and temporal coherence-that enable the integration of voice and location features in the auditory cortex. These results provide new insights into the distributed, multidimensional nature of auditory object formation during active listening in complex environments.Significance statement In everyday life, listeners effortlessly extract individual sound sources from complex acoustic scenes which contain multiple sound sources. Yet, how the brain links the different features of a particular sound source to each other - such as a talker's voice characteristics and location - is poorly understood. Here, we show that two neural mechanisms contribute to encoding and integrating voice and location features in multi-talker sound scenes: (1) some neuronal sites are sensitive to both voice and location and their activity patterns encode these features jointly; (2) the responses of neuronal sites that process only one sound feature - that is, location or voice - align temporally to form a stream that is segregated from the other talker.

查看原文本刊更多论文

联合种群编码和时间相干将自然多话话人场景中与会话话人的声音和位置特征联系起来。

听众可以毫不费力地从复杂的声学场景中提取多维听觉对象，比如一个本地化的说话者。然而，能够同时编码和连接不同声音特征的神经机制——比如说话者的声音和位置——还没有被完全理解。通过对7名神经外科患者（4名男性，3名女性）的侵入性颅内记录，我们研究了人类听觉皮层如何在自然的多人说话场景中处理和整合这些特征，以及注意机制如何调节这种特征整合。我们发现皮质部位表现出连续的特征敏感性，从单特征敏感部位（主要对声音频谱特征或位置特征作出反应）到双特征敏感部位（对两种特征都作出反应）。在人群水平上，来自单特征和双特征敏感部位的神经反应模式共同编码了与会谈话者的声音和位置。值得注意的是，单特征敏感位点对其主要特征进行了更精确的编码，但也表示了关于次要特征的粗略信息。网站选择性地跟踪一个单独的、参与的语音流，同时对语音和位置特征进行编码，证明了选择性注意和特征集成之间的联系。此外，注意力选择性地增强了声音和位置敏感部位之间的时间一致性，表明时间同步是连接这些特征的机制。我们的研究结果强调了两种互补的神经机制——联合种群编码和时间一致性——这两种机制使得听觉皮层中声音和位置特征的整合成为可能。这些结果为在复杂环境中主动倾听时听觉对象形成的分布、多维性质提供了新的见解。在日常生活中，听众可以毫不费力地从包含多个声源的复杂声学场景中提取单个声源。然而，大脑是如何将一个特定声源的不同特征相互联系起来的——比如说话者的声音特征和位置——人们知之甚少。本研究表明，在多说话者声音场景中，两种神经机制有助于语音和位置特征的编码和整合：(1)一些神经元位点对语音和位置特征都很敏感，它们的活动模式共同编码这些特征；(2)只处理一种声音特征（即位置或声音）的神经元位点的反应会暂时对齐，形成与其他说话者分离的流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Neuroscience 医学-神经科学

CiteScore

9.30

自引率

3.80%

发文量

1164

审稿时长

12 months

期刊介绍： JNeurosci (ISSN 0270-6474) is an official journal of the Society for Neuroscience. It is published weekly by the Society, fifty weeks a year, one volume a year. JNeurosci publishes papers on a broad range of topics of general interest to those working on the nervous system. Authors now have an Open Choice option for their published articles