Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg
{"title":"气味表征:语言模型的下一个前沿?","authors":"Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg","doi":"10.1016/j.cognition.2025.106243","DOIUrl":null,"url":null,"abstract":"<div><div>Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"264 ","pages":"Article 106243"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Representations of smells: The next frontier for language models?\",\"authors\":\"Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg\",\"doi\":\"10.1016/j.cognition.2025.106243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.</div></div>\",\"PeriodicalId\":48455,\"journal\":{\"name\":\"Cognition\",\"volume\":\"264 \",\"pages\":\"Article 106243\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognition\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010027725001830\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725001830","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
Representations of smells: The next frontier for language models?
Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.
期刊介绍:
Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.