荷兰语大语言模型在句子、段落和书籍阅读中的惊人估计的系统评价。

IF 3.9 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL
Sam Boeve, Louisa Bogaerts
{"title":"荷兰语大语言模型在句子、段落和书籍阅读中的惊人估计的系统评价。","authors":"Sam Boeve, Louisa Bogaerts","doi":"10.3758/s13428-025-02774-4","DOIUrl":null,"url":null,"abstract":"<p><p>Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"266"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361287/pdf/","citationCount":"0","resultStr":"{\"title\":\"A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.\",\"authors\":\"Sam Boeve, Louisa Bogaerts\",\"doi\":\"10.3758/s13428-025-02774-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).</p>\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 9\",\"pages\":\"266\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361287/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-025-02774-4\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02774-4","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

利用神经语言模型计算单词可预测性的研究已经获得了支持惊讶理论的有力证据。在遇到一个单词时,读者会经历一个处理难度,这个难度与单词的惊讶度成线性关系。这种效应的证据已经在英语中建立,或者使用多语言模型来估计跨语言的惊讶。与此同时,许多未知心理测量质量的特定语言模型被公开提供。在这里,我们对专门为荷兰语设计的大型语言模型集合的惊人估计进行了系统评估,检查它们在句子,段落和书籍阅读的语料库中如何很好地解释阅读时间。我们将它们的性能与多语言模型和N-gram模型进行比较。虽然模型对阅读时间的预测能力在不同的语料库中差异很大,但基于gpt -2的模型显示出更优越的整体性能。我们发现,荷兰语的大型语言模型显示出与英语相同的逆比例趋势,较小模型的惊人估计比最大模型更适合阅读时间。我们也复制了惊讶对荷兰语阅读时间的线性影响。然而,这两种效果都取决于用于评估的语料库。总的来说,这些结果提供了荷兰大型语言模型的心理测量排行榜,并挑战了心理语言学研究中一刀切的语言模型的概念。从三个语料库的所有神经语言模型中得出的惊喜估计,以及提取惊喜的代码,都是公开的(https://osf.io/wr4qf/)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信