词性之外的意义：用语言模型捕捉伪词定义

IF 5.3 2区计算机科学

Computational Linguistics Pub Date : 2024-07-30 DOI:10.1162/coli_a_00527

Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther

{"title":"词性之外的意义：用语言模型捕捉伪词定义","authors":"Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther","doi":"10.1162/coli_a_00527","DOIUrl":null,"url":null,"abstract":"Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"55 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models\",\"authors\":\"Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther\",\"doi\":\"10.1162/coli_a_00527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.\",\"PeriodicalId\":49089,\"journal\":{\"name\":\"Computational Linguistics\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Linguistics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/coli_a_00527\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00527","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

像 "knackets "或 "spechy "这样的伪词--符合一种语言的正字法规则但不出现在其词典中的字母串--传统上被认为是没有意义的，在实证研究中也是这样使用的。然而，最近的研究显示了与这些词相关的特定语义模式，以及对人类伪词处理的语义影响，这些研究使人们对这种观点产生了怀疑。虽然这些研究表明伪词是有意义的，但对于人类是否能够将明确的陈述性语义内容赋予不熟悉的词形，这些研究只提供了极为有限的见解。在本研究中，我们采用了探索-确认研究设计来探讨这一问题。在第一项探索性研究中，我们从已有的单词和假词数据集以及人类为这些项目生成的定义入手。通过使用 18 种不同的语言模型，我们发现，与其他项目的定义相比，实际生成的（伪）词定义更接近各自的（伪）词。在这些初步结果的基础上，我们进行了第二次预先登记的高功率确认性研究，收集了一组新的、受控的（伪）词释义。第二次研究证实了第一次研究的结果。综上所述，这些研究结果支持这样一种观点，即意义建构是由一个灵活的形式-意义映射系统支持的，该系统基于语言环境中的统计规律性，能够在遇到新词条目时立即将其纳入其中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models

Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Linguistics Computer Science-Artificial Intelligence

自引率

0.00%

发文量

期刊介绍： Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.