Alice Van Audenhaege, Stefania Mattioni, Filippo Cerpelloni, Remi Gau, Arnaud Szmalec, Olivier Collignon
{"title":"Phonological Representations of Auditory and Visual Speech in the Occipito-temporal Cortex and Beyond.","authors":"Alice Van Audenhaege, Stefania Mattioni, Filippo Cerpelloni, Remi Gau, Arnaud Szmalec, Olivier Collignon","doi":"10.1523/JNEUROSCI.1415-24.2025","DOIUrl":null,"url":null,"abstract":"<p><p>Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their location and nature remain unclear. We characterized brain activity using fMRI (13 males and 11 females) to functionally and individually define bilateral fusiform face areas (FFA), the left word-selective ventral occipito-temporal cortex (word-VOTC), an audiovisual speech region in the left superior temporal sulcus (lSTS); and control regions in bilateral scene-selective parahippocampal place areas (PPA). In these regions, we performed multivariate pattern classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the word-VOTC and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the word-VOTC, as we found viseme but not phoneme representation in adjacent FFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in word-VOTC. A whole-brain cross-modal searchlight analysis additionally revealed aligned audiovisual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the word-VOTC, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the word-VOTC as they do in the STS and left somato-motor cortex, suggesting distinct representations across a distributed multisensory phonological network.</p>","PeriodicalId":50114,"journal":{"name":"Journal of Neuroscience","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199548/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/JNEUROSCI.1415-24.2025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their location and nature remain unclear. We characterized brain activity using fMRI (13 males and 11 females) to functionally and individually define bilateral fusiform face areas (FFA), the left word-selective ventral occipito-temporal cortex (word-VOTC), an audiovisual speech region in the left superior temporal sulcus (lSTS); and control regions in bilateral scene-selective parahippocampal place areas (PPA). In these regions, we performed multivariate pattern classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the word-VOTC and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the word-VOTC, as we found viseme but not phoneme representation in adjacent FFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in word-VOTC. A whole-brain cross-modal searchlight analysis additionally revealed aligned audiovisual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the word-VOTC, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the word-VOTC as they do in the STS and left somato-motor cortex, suggesting distinct representations across a distributed multisensory phonological network.
期刊介绍:
JNeurosci (ISSN 0270-6474) is an official journal of the Society for Neuroscience. It is published weekly by the Society, fifty weeks a year, one volume a year. JNeurosci publishes papers on a broad range of topics of general interest to those working on the nervous system. Authors now have an Open Choice option for their published articles