{"title":"Memory for prediction: A Transformer-based theory of sentence processing","authors":"Soo Hyun Ryu , Richard L. Lewis","doi":"10.1016/j.jml.2025.104670","DOIUrl":"10.1016/j.jml.2025.104670","url":null,"abstract":"<div><div>We demonstrate that Transformer-based neural network language models provide a new foundation for mechanistic theories of sentence processing that seamlessly integrate expectation-based and memory-based accounts. First, we show that the attention mechanism in GPT2-small operates as a kind of cue-based retrieval architecture that is subject to similarity-based interference. Second, we show that it provides accounts of classic memory effects in parsing, including contrasts involving relative clauses and center-embedding. Third, we show that a simple word-by-word entropy metric computed over the internal attention patterns provides an index of memory interference that explains variance in eye-tracking and self-paced reading time measures (independent of surprisal and other predictors) in two natural story reading time corpora. Because the cues and representations are learned, there is no need for the theorist to postulate representational features and cues. Transformers provide practical modeling tools for exploring the effects of memory and experience, given the increasing availability of both pre-trained models and software for training new models, and the ease with which surprisal and attention entropy metrics may be computed.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"145 ","pages":"Article 104670"},"PeriodicalIF":2.9,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144703412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eva Portelance , Siva Reddy , Timothy J. O’Donnell
{"title":"Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models","authors":"Eva Portelance , Siva Reddy , Timothy J. O’Donnell","doi":"10.1016/j.jml.2025.104672","DOIUrl":"10.1016/j.jml.2025.104672","url":null,"abstract":"<div><div>Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different independent learning strategies. Here, we argue for a unified approach, where instead they are both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously via joint learning. This more general learning strategy results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition <em>easier</em> for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"145 ","pages":"Article 104672"},"PeriodicalIF":2.9,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaja Jarosz , Cerys Hughes , Andrew Lamont , Brandon Prickett , Maggie Baird , Seoyoung Kim , Max Nelson
{"title":"Type and token frequency jointly drive learning of morphology","authors":"Gaja Jarosz , Cerys Hughes , Andrew Lamont , Brandon Prickett , Maggie Baird , Seoyoung Kim , Max Nelson","doi":"10.1016/j.jml.2025.104666","DOIUrl":"10.1016/j.jml.2025.104666","url":null,"abstract":"<div><div>We examine the joint roles of type frequency and token frequency in three artificial language learning experiments involving lexicalized plural allomorphy. The primary role of type frequency in productivity is well-established, but debates about the precise relationship between type frequency and productivity continue. The effect of token frequency on productivity is even more controversial: some lines of research suggest token frequency and productivity are inversely related, other results indicate they are positively related, and yet others argue token frequency plays no role in productivity. We address both of these questions. Our learning framework makes it possible to examine the effects of these variables on generalization to novel forms and to examine how sensitivity to these factors affects the time-course of learning. The first two experiments differentiate predictions for generalization of three distinct hypotheses about the role of type frequency, while the third experiment investigates the independent role of token frequency. We find that both type and token frequency independently and positively contribute to learning rates and generalization across the three experiments. We also apply two computational learning theories – implementing two prominent theoretical linguistic frameworks – to the learning of the lexically-conditioned allomorphy patterns in our experiments. Despite their differences, we show that the incremental learning dynamics of both models correctly predict the general trends in generalization rates, learning curves, and the influence of token frequency observed across the experimental conditions.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104666"},"PeriodicalIF":3.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasia Kobzeva , Suhas Arehalli , Tal Linzen , Dave Kush
{"title":"Learning filler-gap dependencies with neural language models: Testing island sensitivity in Norwegian and English","authors":"Anastasia Kobzeva , Suhas Arehalli , Tal Linzen , Dave Kush","doi":"10.1016/j.jml.2025.104663","DOIUrl":"10.1016/j.jml.2025.104663","url":null,"abstract":"<div><div>Human linguistic input is often claimed to be impoverished with respect to linguistic evidence for complex structural generalizations that children induce. The field of language acquisition is currently debating the ability of various learning algorithms to accurately derive target generalizations from the input. A growing body of research explores whether Neural Language Models (NLMs) can induce human-like generalizations about filler-gap dependencies (FGDs) in English, including island constraints on their distribution. Based on positive results for select test cases, some authors have argued that the relevant generalizations can be learned without domain-specific learning biases (Wilcox et al., 2023), though other researchers dispute this conclusion ((Lan et al., 2024b; Howitt et al.,2024). Previous work focuses solely on English, but broader claims about filler-gap dependency learnability can only be made based on multiple languages and dependency types. To address this gap, we compare the ability of NLMs to learn restrictions on FGDs in English and Norwegian. Our results are mixed: they show that although these models acquire some sophisticated generalizations about filler-gap dependencies in the two languages, their generalizations still diverge from those of humans. When tested on structurally complex environments, the models sometimes adopt narrower generalizations than humans do or overgeneralize beyond their input in non-human-like ways. We conclude that current evidence does not support the claim that FGDs and island constraints on them can be learned without domain-specific biases.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104663"},"PeriodicalIF":2.9,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144665627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sara B. Félix , Marie Poirier , Josefa N.S. Pandeirada
{"title":"Exploring the animacy effect in focal prospective memory tasks: When animates don’t stand out","authors":"Sara B. Félix , Marie Poirier , Josefa N.S. Pandeirada","doi":"10.1016/j.jml.2025.104673","DOIUrl":"10.1016/j.jml.2025.104673","url":null,"abstract":"<div><div>The animacy effect refers to a memory advantage for animates/living beings as compared to inanimates/nonliving things. So far, the animacy effect has been investigated mostly in retrospective memory. Given that memory serves a future-oriented function, and considering the adaptive significance of animacy, it has been proposed that it should also confer an advantage in prospective memory (i.e., memory for intentions/actions to-be-performed in the future). Recent research reported an animacy effect in nonfocal event-based prospective memory tasks. The present work explored this effect in focal prospective memory. In a series of five studies, conducted in different countries and languages, we employed various ongoing tasks. Across all studies, no differences in prospective memory performance between animates and inanimates were found. This result held in a sign-test including all participants (<em>N</em> = 408 young adults) for a more powered analysis. Also, no differences between animates and inanimates were obtained in the baseline and filler trials. These results are discussed considering the mechanisms that have been proposed to explain the effect in retrospective memory tasks, namely attention-prioritization and richness of encoding. Overall, our results are partially explained by the attention-prioritization account of the animacy effect and also provide support for the Multiprocess Framework.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104673"},"PeriodicalIF":2.9,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The development of the adaptive use of different forms of rehearsal in verbal serial recall tasks. A multi-method study","authors":"Sebastian Poloczek , Christopher Jarrold","doi":"10.1016/j.jml.2025.104674","DOIUrl":"10.1016/j.jml.2025.104674","url":null,"abstract":"<div><div>Verbal rehearsal is a key feature of certain working memory models that have previously assumed that children develop adult-like rehearsal around the age of 7. However, a broader literature indicates that younger children are capable of rehearsal. The present study, consisting of two experiments with 191 primary school children in total, combined methods that are rarely used to study rehearsal in serial recall. Self-paced presentation times were obtained as a behavioural indicator of strategy use. On half of trials, children additionally reported their strategies via think-aloud (Expt. 1) or immediate trial-by trial self-reports (Expt. 1 & 2). Results from the three methods employed in Experiment 1 with 10- to 11-year-olds converged on the conclusion that multiple strategies were used across trials. Listening, single rehearsal, and cumulative rehearsal were common strategies that were validly reported with no or only small effects of reactivity of strategy reporting. Experiment 2 revealed that between the ages of 6 to 11 years children employed a range of strategies across trials. Listening without rehearsal was common and cumulative rehearsal rare among the younger children, but cumulative rehearsal and strategy adaptivity to list length gradually increased with age. Importantly, self-reports were corroborated by self-presentation times even in younger children. We conclude that rehearsal development does not follow a stage-like progression. Rather, the data support an overlapping waves model as several strategies coexist, the likelihood of using a strategy changes gradually, and adaptivity of strategy choices still improves among older children.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104674"},"PeriodicalIF":2.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity-based interference in the processing of classifier-noun dependencies in Mandarin Chinese","authors":"Hailin Hao , Zuzanna Fuchs , Shravan Vasishth","doi":"10.1016/j.jml.2025.104669","DOIUrl":"10.1016/j.jml.2025.104669","url":null,"abstract":"<div><div>During the processing of linguistic dependencies, the presence of a non-dependent word—referred to as a distractor—can sometimes complicate the identification of the correct subject. This phenomenon, known as similarity-based interference, provides a valuable testing ground for competing theories of sentence processing and has garnered significant interest in the field of psycholinguistics. One prominent theory, cue-based retrieval, suggests that the parser initiates a search for the relevant linguistic dependent at the retrieval site (e.g., the verb) based on a set of retrieval cues. In this work, we explore the use of lexicon-specific cues set by classifiers in the retrieval of noun dependents in Mandarin Chinese to provide evidence for the cue-based retrieval mechanism. A further open question is whether the distractor must intervene between the co-dependents (so-called retroactive interference) or whether the distractor can appear to the left of the dependent elements (so-called proactive interference). Previous work has suggested that proactive interference is weaker than retroactive interference, i.e., that the distractor has to intervene between the co-dependents to influence the dependency completion process. Using self-paced reading and A-Maze tasks, and Bayes Factors for hypothesis testing, we found robust evidence for a predicted interference effect in retroactive configurations, but no interference in proactive configurations. We discuss the theoretical implications of the current work for theories of retrieval and sentence processing in general.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104669"},"PeriodicalIF":2.9,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunqian Xiao , Yihong Luo , Yuan Gao , Jiejie Liao , Mengxia Yu , Lei Mo
{"title":"Initially encoding attended but outdated information into working memory: behavioral and neural evidence","authors":"Chunqian Xiao , Yihong Luo , Yuan Gao , Jiejie Liao , Mengxia Yu , Lei Mo","doi":"10.1016/j.jml.2025.104668","DOIUrl":"10.1016/j.jml.2025.104668","url":null,"abstract":"<div><div>Attention has traditionally been regarded as a gateway to working memory, largely determining whether information enters it. Recent work suggests that the brain actively inhibits attended but outdated information to prevent it from entering working memory. However, it remains unknown whether this information is blocked directly by attention before entering working memory, or after being encoded into working memory, given that such information has already been attended to and processed. This study explored this question by manipulating stimulus onset asynchronies (SOAs) in three experiments, including behavioral and electroencephalography (EEG) measures, and examining memory traces of attended but outdated information at different time points. Behavioral evidence demonstrated the stability of the memory trace of the attended but outdated information only when SOA was short. This finding was observed across different features and paradigms. Time-frequency analysis indicated that the brain inhibited attention to information matching the attended but outdated information in the early stage, with behavioral performance predicted by alpha modulation of the right hemisphere. These results suggest that attended but outdated information is initially encoded into working memory, even though it does not need to be remembered. These findings enhance our understanding of the impact of attention on working memory.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104668"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144366673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Judgments of learning enhance elaborative rather than relational processing: Implications from phonologically related and phonological-semantic mediated pairs","authors":"Minyu Chang , C.J. Brainerd","doi":"10.1016/j.jml.2025.104667","DOIUrl":"10.1016/j.jml.2025.104667","url":null,"abstract":"<div><div>Judgment of learning (JOL) reactivity refers to the effect that making JOLs directly impacts subsequent memory performance. One of the most replicated findings in this line of research is that JOLs enhance memory for related word pairs. However, so far, only semantic relatedness has been studied, and many existing theories of JOL reactivity, such as the cue-strengthening hypothesis and the enhanced relational processing account, are heavily dependent on findings generated with semantically related word pairs. The current study used phonologically related pairs instead of semantically related pairs. Specifically, we used rhyme pairs (e.g., <em>fall-tall</em>) in Experiments 1A and 1B and homophone pairs (e.g., <em>coarse-course</em>) in Experiments 2A, 2B, and 2C. A consistent pattern emerged that JOLs did not produce reactivity for phonologically related pairs on associative recall tests, unlike for semantically related pairs. This supports the hypothesis that JOL reactivity reflects an enhancement in elaborative processing, which presumably can involve either item-specific or relational processing that focuses on deeper, semantic content, rather than relational processing that focuses on relational content of any nature. In Experiment 3, we found positive JOL reactivity for phonological-semantic mediated pairs (e.g., <em>coarse-class</em>, where <em>coarse</em> is phonologically related to an unpresented mediator <em>course</em> that is semantically related to <em>class</em>). This, in contrast to the null reactivity for pure phonological relatedness, again supports the necessity of semantic processing in positive JOL reactivity. We discuss how the elaborative processing account offers an opportunity to reconcile existing theoretical explanations and help build a general framework for JOL reactivity.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104667"},"PeriodicalIF":2.9,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Working memory and attentional control abilities predict individual differences in visual long-term memory tasks","authors":"Chong Zhao , Edward K. Vogel","doi":"10.1016/j.jml.2025.104665","DOIUrl":"10.1016/j.jml.2025.104665","url":null,"abstract":"<div><div>Working memory predicts cognitive abilities like fluid intelligence (gF) and source memory. This suggests these abilities depend on working memory and attentional control. When attentional resources were occupied by a secondary task, previous research shows that source memory performance is more impaired than recognition memory, implying that working memory abilities exert less influence on recognition memory performance than source memory performance. Here, we directly tested if working memory and attentional control differences predict visual recognition memory performance across four experiments (n = 841 in total). Surprisingly, we found that working memory and attentional control nearly always predicted recognition memory performance as robustly as source memory (Studies 1, 3 and 4), with the exception of when rapid presentation rates exceeded the temporal limits of attention during encoding (Study 2). Additionally, source memory and recognition memory, regardless of encoding presentation rates across experiments, remained highly correlated across individuals. Together, our findings suggest that working memory and attention control resources play a role in performance of both recognition and source memory tests of visual long-term memory.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}