Kiki van der Heijden, Prachi Patel, Stephan Bickel, Jose L Herrero, Ashesh D Mehta, Nima Mesgarani
{"title":"Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes.","authors":"Kiki van der Heijden, Prachi Patel, Stephan Bickel, Jose L Herrero, Ashesh D Mehta, Nima Mesgarani","doi":"10.1523/JNEUROSCI.0754-25.2025","DOIUrl":null,"url":null,"abstract":"<p><p>Listeners effortlessly extract multidimensional auditory objects, such as a localized talker, from complex acoustic scenes. However, the neural mechanisms that enable simultaneous encoding and linking of distinct sound features-such as a talker's voice and location-are not fully understood. Using invasive intracranial recordings in seven neurosurgical patients (4 male, 3 female), we investigated how the human auditory cortex processes and integrates these features during naturalistic multi-talker scenes and how attentional mechanisms modulate such feature integration. We found that cortical sites exhibit a continuum of feature sensitivity, ranging from single-feature sensitive sites (responsive primarily to voice spectral features or to location features) to dual-feature sensitive sites (responsive to both features). At the population level, neural response patterns from both single- and dual-feature sensitive sites jointly encoded the attended talker's voice and location. Notably, single-feature sensitive sites encoded their primary feature with greater precision but also represented coarse information about the secondary feature. Sites selectively tracking a single, attended speech stream concurrently encoded both voice and location features, demonstrating a link between selective attention and feature integration. Additionally, attention selectively enhanced temporal coherence between voice- and location-sensitive sites, suggesting that temporal synchronization serves as a mechanism for linking these features. Our findings highlight two complementary neural mechanisms-joint population coding and temporal coherence-that enable the integration of voice and location features in the auditory cortex. These results provide new insights into the distributed, multidimensional nature of auditory object formation during active listening in complex environments.<b>Significance statement</b> In everyday life, listeners effortlessly extract individual sound sources from complex acoustic scenes which contain multiple sound sources. Yet, how the brain links the different features of a particular sound source to each other - such as a talker's voice characteristics and location - is poorly understood. Here, we show that two neural mechanisms contribute to encoding and integrating voice and location features in multi-talker sound scenes: (1) some neuronal sites are sensitive to both voice and location and their activity patterns encode these features jointly; (2) the responses of neuronal sites that process only one sound feature - that is, location or voice - align temporally to form a stream that is segregated from the other talker.</p>","PeriodicalId":50114,"journal":{"name":"Journal of Neuroscience","volume":" ","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/JNEUROSCI.0754-25.2025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Listeners effortlessly extract multidimensional auditory objects, such as a localized talker, from complex acoustic scenes. However, the neural mechanisms that enable simultaneous encoding and linking of distinct sound features-such as a talker's voice and location-are not fully understood. Using invasive intracranial recordings in seven neurosurgical patients (4 male, 3 female), we investigated how the human auditory cortex processes and integrates these features during naturalistic multi-talker scenes and how attentional mechanisms modulate such feature integration. We found that cortical sites exhibit a continuum of feature sensitivity, ranging from single-feature sensitive sites (responsive primarily to voice spectral features or to location features) to dual-feature sensitive sites (responsive to both features). At the population level, neural response patterns from both single- and dual-feature sensitive sites jointly encoded the attended talker's voice and location. Notably, single-feature sensitive sites encoded their primary feature with greater precision but also represented coarse information about the secondary feature. Sites selectively tracking a single, attended speech stream concurrently encoded both voice and location features, demonstrating a link between selective attention and feature integration. Additionally, attention selectively enhanced temporal coherence between voice- and location-sensitive sites, suggesting that temporal synchronization serves as a mechanism for linking these features. Our findings highlight two complementary neural mechanisms-joint population coding and temporal coherence-that enable the integration of voice and location features in the auditory cortex. These results provide new insights into the distributed, multidimensional nature of auditory object formation during active listening in complex environments.Significance statement In everyday life, listeners effortlessly extract individual sound sources from complex acoustic scenes which contain multiple sound sources. Yet, how the brain links the different features of a particular sound source to each other - such as a talker's voice characteristics and location - is poorly understood. Here, we show that two neural mechanisms contribute to encoding and integrating voice and location features in multi-talker sound scenes: (1) some neuronal sites are sensitive to both voice and location and their activity patterns encode these features jointly; (2) the responses of neuronal sites that process only one sound feature - that is, location or voice - align temporally to form a stream that is segregated from the other talker.
期刊介绍:
JNeurosci (ISSN 0270-6474) is an official journal of the Society for Neuroscience. It is published weekly by the Society, fifty weeks a year, one volume a year. JNeurosci publishes papers on a broad range of topics of general interest to those working on the nervous system. Authors now have an Open Choice option for their published articles