Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

IF 4.6 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL
Marc Brysbaert, Gonzalo Martínez, Pedro Reviriego
{"title":"Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.","authors":"Marc Brysbaert, Gonzalo Martínez, Pedro Reviriego","doi":"10.3758/s13428-024-02561-7","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"28"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02561-7","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.

超越基于计数的词频:人工智能生成的单词和短语熟悉度估计是语言知识的一个有趣的附加指标。
本研究探讨了大型语言模型(llm)在估计单词和多词表达(MWEs)熟悉度方面的潜力。我们使用现有的人类熟悉度评级验证了LLM对孤立单词的估计,并发现了很强的相关性。在大型研究中,LLM熟悉度估计在预测词汇决策和命名性能方面比最好的可用词频测量方法表现得更好。然后,我们将LLM估计应用于MWEs,也发现它们在测量这些表达的熟悉度方面的有效性。我们已经创建了一个包含超过40万个英语单词和MWEs的列表,并使用llm生成的熟悉度估算,我们希望这将成为研究人员的宝贵资源。此外,为了简化刺激因素的选择,还清理了近15万个条目,不包括不太知名的刺激因素。我们的研究结果突出了基于llm的熟悉度估计的优势,包括它们比传统的词频测量(特别是预测单词识别准确性)更好的性能,它们推广到MWEs的能力,大单词列表的可用性,以及易于获得所有类型刺激的新估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信