荷兰语大语言模型在句子、段落和书籍阅读中的惊人估计的系统评价。

IF 3.9 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods Pub Date : 2025-08-18 DOI:10.3758/s13428-025-02774-4

Sam Boeve, Louisa Bogaerts

{"title":"荷兰语大语言模型在句子、段落和书籍阅读中的惊人估计的系统评价。","authors":"Sam Boeve, Louisa Bogaerts","doi":"10.3758/s13428-025-02774-4","DOIUrl":null,"url":null,"abstract":"Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"266"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361287/pdf/","citationCount":"0","resultStr":"{\"title\":\"A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.\",\"authors\":\"Sam Boeve, Louisa Bogaerts\",\"doi\":\"10.3758/s13428-025-02774-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 9\",\"pages\":\"266\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361287/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-025-02774-4\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02774-4","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

利用神经语言模型计算单词可预测性的研究已经获得了支持惊讶理论的有力证据。在遇到一个单词时，读者会经历一个处理难度，这个难度与单词的惊讶度成线性关系。这种效应的证据已经在英语中建立，或者使用多语言模型来估计跨语言的惊讶。与此同时，许多未知心理测量质量的特定语言模型被公开提供。在这里，我们对专门为荷兰语设计的大型语言模型集合的惊人估计进行了系统评估，检查它们在句子，段落和书籍阅读的语料库中如何很好地解释阅读时间。我们将它们的性能与多语言模型和N-gram模型进行比较。虽然模型对阅读时间的预测能力在不同的语料库中差异很大，但基于gpt -2的模型显示出更优越的整体性能。我们发现，荷兰语的大型语言模型显示出与英语相同的逆比例趋势，较小模型的惊人估计比最大模型更适合阅读时间。我们也复制了惊讶对荷兰语阅读时间的线性影响。然而，这两种效果都取决于用于评估的语料库。总的来说，这些结果提供了荷兰大型语言模型的心理测量排行榜，并挑战了心理语言学研究中一刀切的语言模型的概念。从三个语料库的所有神经语言模型中得出的惊喜估计，以及提取惊喜的代码，都是公开的（https://osf.io/wr4qf/）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

查看原文本刊更多论文

A systematic evaluation of Dutch large language models' surprisal estimates in sentence, paragraph and book reading.

Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word's surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times in corpora of sentence, paragraph and book reading. We compare their performance to multilingual models and an N-gram model. While models' predictive power for reading times varied considerably across corpora, GPT-2-based models demonstrated superior overall performance. We show that Dutch large language models exhibit the same inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models. We also replicate the linear effect of surprisal on reading times for Dutch. Both effects, however, depended on the corpus used for evaluation. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a one-size-fits-all language model for psycholinguistic research. The surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, are made publicly available ( https://osf.io/wr4qf/ ).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.