利用视觉问题解答模型从基础语言中学习功能词的含义

IF 2.3 2区心理学 Q2 PSYCHOLOGY, EXPERIMENTAL

Cognitive Science Pub Date : 2024-05-14 DOI:10.1111/cogs.13448

Eva Portelance, Michael C. Frank, Dan Jurafsky

{"title":"利用视觉问题解答模型从基础语言中学习功能词的含义","authors":"Eva Portelance, Michael C. Frank, Dan Jurafsky","doi":"10.1111/cogs.13448","DOIUrl":null,"url":null,"abstract":"Interpreting a seemingly simple function word like “or,” “behind,” or “more” can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network-based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learned by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spatial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning as well as early evidence that they are sensitive to alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on the frequency of models' input. Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in a visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.","PeriodicalId":48349,"journal":{"name":"Cognitive Science","volume":"48 5","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.13448","citationCount":"0","resultStr":"{\"title\":\"Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model\",\"authors\":\"Eva Portelance, Michael C. Frank, Dan Jurafsky\",\"doi\":\"10.1111/cogs.13448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interpreting a seemingly simple function word like “or,” “behind,” or “more” can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network-based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learned by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spatial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning as well as early evidence that they are sensitive to alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on the frequency of models' input. Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in a visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.\",\"PeriodicalId\":48349,\"journal\":{\"name\":\"Cognitive Science\",\"volume\":\"48 5\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.13448\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Science\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cogs.13448\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Science","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cogs.13448","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

解释 "或"、"在......后面 "或 "更多 "等看似简单的功能词需要逻辑、数字和关系推理。儿童是如何学会这些词的呢？先前的习得理论往往依赖于先天知识基础的假设。然而，最近基于神经网络的视觉问题解答模型显然可以学会使用功能词，作为回答复杂视觉场景问题的一部分。在本文中，我们研究了这些模型对功能词的学习情况，希望能更好地理解模型和儿童是如何学习这些词的含义的。我们的研究表明，在视觉语言基础上训练的递归模型可以学习需要空间和数字推理的功能词的梯度语义。此外，我们还发现，这些模型可以在没有任何逻辑推理知识的情况下学习逻辑连接词和和或的含义，而且有早期证据表明，这些模型在解释语言时对替代表达方式很敏感。最后，我们还发现单词学习的难度取决于模型输入的频率。我们的研究结果提供了概念证明，即在没有任何语言意义知识的前提下，通过使用非符号的一般统计学习算法，在视觉基础的语境中学习功能词的细微解释是可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model

查看原文本刊更多论文

Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model

Interpreting a seemingly simple function word like “or,” “behind,” or “more” can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network-based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learned by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spatial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning as well as early evidence that they are sensitive to alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on the frequency of models' input. Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in a visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Science PSYCHOLOGY, EXPERIMENTAL-

CiteScore

4.10

自引率

8.00%

发文量

139

期刊介绍： Cognitive Science publishes articles in all areas of cognitive science, covering such topics as knowledge representation, inference, memory processes, learning, problem solving, planning, perception, natural language understanding, connectionism, brain theory, motor control, intentional systems, and other areas of interdisciplinary concern. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers in cognitive science and its associated fields, including anthropologists, education researchers, psychologists, philosophers, linguists, computer scientists, neuroscientists, and roboticists.