气味表征：语言模型的下一个前沿？

IF 2.8 1区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Cognition Pub Date : 2025-07-16 DOI:10.1016/j.cognition.2025.106243

Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg

{"title":"气味表征：语言模型的下一个前沿？","authors":"Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg","doi":"10.1016/j.cognition.2025.106243","DOIUrl":null,"url":null,"abstract":"<div><div>Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"264 ","pages":"Article 106243"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Representations of smells: The next frontier for language models?\",\"authors\":\"Murathan Kurfalı , Pawel Herman , Stephen Pierzchajlo , Jonas Olofsson , Thomas Hörberg\",\"doi\":\"10.1016/j.cognition.2025.106243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.</div></div>\",\"PeriodicalId\":48455,\"journal\":{\"name\":\"Cognition\",\"volume\":\"264 \",\"pages\":\"Article 106243\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognition\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010027725001830\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725001830","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

然而，人类的认知是通过感知驱动的与环境的相互作用发展起来的，语言模型（lm）是“无实体学习者”，这可能会限制它们作为模型系统的有用性。我们评估了LMs从自然语言中恢复感官信息的能力，解决了认知科学研究文献中的一个重大空白。我们的研究是通过嗅觉进行的，因为它在自然语言中严重缺乏代表性，因此对语言和认知模型提出了独特的挑战。通过对静态词嵌入模型（Word2Vec、FastText）、基于编码器的模型（BERT）和基于解码器的大型模型（llm）等三代机器学习模型的能力进行系统评估；gpt - 40, Llama 3.1等)，在近200种训练配置下，我们研究了它们从文本数据中获取信息以近似人类气味感知的熟练程度。作为LMs性能的基准，我们使用了三种不同的实验气味数据集，包括气味相似度评级，从单词标签中想象的气味配对的相似度，以及气味到标签的评级。结果揭示了LMs准确表示嗅觉信息的可能性，并描述了实现这种可能性的条件。静态的、简单的模型在某些训练配置下在捕捉气味感知相似性方面表现最好，而gpt - 40在模拟嗅觉-语义关系方面表现出色，这表明它在收集的气味相似性来自基于单词的评估的数据集上表现优异。我们的研究结果表明，自然语言对人类嗅觉信息的潜在信息进行编码，这些信息可以通过基于文本的LMs在不同程度上进行检索。我们的研究表明，LMs有望成为研究认知科学中符号表征与感知经验之间长期争论的关系的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Representations of smells: The next frontier for language models?

Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognition PSYCHOLOGY, EXPERIMENTAL-

CiteScore

6.40

自引率

5.90%

发文量

283

期刊介绍： Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.