Kuan-Jung Huang , Suhas Arehalli , Mari Kugemoto , Christian Muxica , Grusha Prasad , Brian Dillon , Tal Linzen
{"title":"Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty","authors":"Kuan-Jung Huang , Suhas Arehalli , Mari Kugemoto , Christian Muxica , Grusha Prasad , Brian Dillon , Tal Linzen","doi":"10.1016/j.jml.2024.104510","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104510","url":null,"abstract":"<div><p>Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences – one of the major concerns of psycholinguistics – be explained by predictability, as estimated using computational language models, and operationalized as surprisal (negative log probability)? A precise, quantitative test of this question requires a much larger scale data collection effort than has been done in the past. We present the Syntactic Ambiguity Processing Benchmark, a dataset of self-paced reading times from 2000 participants, who read a diverse set of complex English sentences. This dataset makes it possible to measure processing difficulty associated with individual syntactic constructions, and even individual sentences, precisely enough to rigorously test the predictions of computational models of language comprehension. By estimating the function that relates surprisal to reading times from filler items included in the experiment, we find that the predictions of language models with two different architectures sharply diverge from the empirical reading time data, dramatically underpredicting processing difficulty, failing to predict relative difficulty among different syntactic ambiguous constructions, and only partially explaining item-wise variability. These findings suggest that next-word prediction is most likely insufficient on its own to explain human syntactic processing.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"137 ","pages":"Article 104510"},"PeriodicalIF":4.3,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139992406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael G. Cutter , Kevin B. Paterson , Ruth Filik
{"title":"Eye-movements during reading and noisy-channel inference making","authors":"Michael G. Cutter , Kevin B. Paterson , Ruth Filik","doi":"10.1016/j.jml.2024.104513","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104513","url":null,"abstract":"<div><p>This novel experiment investigates the relationship between readers’ eye movements and their use of “noisy channel” inferences when reading implausible sentences, and how this might be affected by cognitive aging. Young (18–26 years) and older (65–87 years) adult participants read sentences which were either plausible or implausible. Crucially, readers could assign a plausible interpretation to the implausible sentences by inferring that a preposition (i.e., <em>to</em>) had been unintentionally omitted or included. Our results reveal that readers’ fixation locations within such sentences are associated with the likelihood of them inferring the presence or absence of this critical preposition to reach a plausible interpretation. Moreover, our older adults were more likely to make these noisy-channel inferences than the younger adults, potentially because their poorer visual processing and greater linguistic experience promote such inference-making. We propose that the present findings provide novel experimental evidence for a perceptual contribution to noisy-channel inference-making during reading.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"137 ","pages":"Article 104513"},"PeriodicalIF":4.3,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0749596X24000160/pdfft?md5=8e879ca0730868cb6c949b346b5f1163&pid=1-s2.0-S0749596X24000160-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139992405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic correlates of stress in speech perception","authors":"Petroula Mousikou , Patrycja Strycharczuk , Kathleen Rastle","doi":"10.1016/j.jml.2024.104509","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104509","url":null,"abstract":"<div><p>Stress is an important property of English spoken words. Research conducted over the past 70 years has sought to determine how acoustic cues, including duration, pitch, and intensity influence stress perception; however, the evidence remains conflicting. In the present study, we used a large dataset of 10 speakers’ productions of disyllabic nonwords to investigate how listeners make use of these cues to perceive stress. Over 100 listeners made stress judgements on nearly one thousand items each, yielding a total of nearly 75,000 analysable responses. Results of average performance showed that stress judgments were influenced by all three cues, both individually and in combination. However, the relative importance of any one cue depended on the value of the other cues, particularly in the frequent situations in which cues offered conflicting stress information. Results of individual performance showed that listeners often use the same acoustic information regarding stress in different ways, but that speakers also sometimes offer different information about stress. Our mega-study approach to investigating word-stress perception eclipses previous studies in terms of its power, and offers new insights into our understanding of how listeners perceive stress.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104509"},"PeriodicalIF":4.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0749596X24000123/pdfft?md5=8713f8067c77509f41ed89d143c3c685&pid=1-s2.0-S0749596X24000123-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139908344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What could have been said? Alternatives and variability in pragmatic inferences","authors":"Eszter Ronai , Ming Xiang","doi":"10.1016/j.jml.2024.104507","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104507","url":null,"abstract":"<div><p>A recent influential experimental finding in pragmatics is that of <em>scalar diversity</em>: that different lexical items vary robustly in how likely they are to lead to scalar inference. For instance, hearers are much more likely to strengthen the meaning of <em>some</em> to <em>some but not all</em> than to infer <em>good but not excellent</em> from <em>good</em>. In this paper, we address the question of what underlies scalar diversity and identify two sources of uncertainty: uncertainty associated with the identity of relevant alternatives, and uncertainty associated with the step of excluding those alternatives. In our experiments, we make use of the Question Under Discussion to eliminate the former, and of the focus particle <em>only</em> to eliminate the latter kind of uncertainty. Our findings show that both manipulations make inference calculation more likely, but only when they are combined is scalar diversity reduced to a minimum. In order to quantitatively characterize the observed (reduction in) variation, this paper adopts the information theoretic measure of relative entropy.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104507"},"PeriodicalIF":4.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139908343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuanli Zang , Shuangshuang Wang , Xuejun Bai , Guoli Yan , Simon P. Liversedge
{"title":"Parafoveal processing of Chinese four-character idioms and phrases in reading: Evidence for multi-constituent unit hypothesis","authors":"Chuanli Zang , Shuangshuang Wang , Xuejun Bai , Guoli Yan , Simon P. Liversedge","doi":"10.1016/j.jml.2024.104508","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104508","url":null,"abstract":"<div><p>The perceptual span in Chinese reading extends one character to the left and three to the right of the point of fixation. Thus, four-character idioms and phrases often extend rightward beyond these limits during reading. We investigated whether such idioms, frequent phrases and equibiased strings are processed parafoveally as Multi-Constituent Units (MCUs). Using the boundary paradigm in Experiments 1 and 2, we separately manipulated preview (identities or pseudocharacters) of the first two and the last two characters of idioms and frequently used phrases. In Experiment 3, we examined processing of strings judged to be a single lexical unit, equi-biased ambiguous strings and matched unambiguous multi-word strings. Experiments 1 and 2 produced greater preview benefit for the final two characters when the first two characters were presented after identity rather than pseudocharacter previews. In Experiment 3, preview effects were largest for single units, reduced for equi-biased strings and smallest for multi-word strings. Together the results demonstrate that four-character idioms and frequently used phrases are processed as MCUs.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104508"},"PeriodicalIF":4.3,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0749596X24000111/pdfft?md5=efb5bea0a9ac4e74d87ea3858b07a341&pid=1-s2.0-S0749596X24000111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139748345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reprint of: Human memory: A proposed system and its control processes","authors":"R.C. Atkinson, R.M. Shiffrin","doi":"10.1016/j.jml.2023.104479","DOIUrl":"https://doi.org/10.1016/j.jml.2023.104479","url":null,"abstract":"","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104479"},"PeriodicalIF":4.3,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139726624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baike Li , David R. Shanks , Wenbo Zhao , Xiao Hu , Liang Luo , Chunliang Yang
{"title":"Do changed learning goals explain why metamemory judgments reactively affect memory?","authors":"Baike Li , David R. Shanks , Wenbo Zhao , Xiao Hu , Liang Luo , Chunliang Yang","doi":"10.1016/j.jml.2024.104506","DOIUrl":"10.1016/j.jml.2024.104506","url":null,"abstract":"<div><p><span>Measurement of mental processes is the bedrock of cognitive psychology, but the interpretation of such measurements is profoundly undermined by evidence that many mental processes are changed by (are reactive to) the act of being observed and measured. The current article is concerned with one particular type of reactivity, namely changes in memory performance when individuals are asked to concurrently monitor their learning via judgments of learning (JOLs). One explanation for memory reactivity is that the requirement to engage in metamemory monitoring changes learners’ goals, shifting them towards greater prioritization of mastering easy items and de-prioritization of memorizing difficult ones. This hypothesis is tested in 5 experiments (2 of which were pre-registered), which varied item difficulty by contrasting related (e.g., </span><em>computer</em> – <em>keyboard</em>) and unrelated (e.g., <em>book</em> – <em>shoe</em>) word pairs. While the experiments find robust evidence that recall is affected by the requirement to make immediate JOLs (reactivity), two key predictions of the goal-change account are not supported. The observed findings suggest that a change in the learner’s goal is not the main mechanism underlying JOL reactivity. Alternative explanations for why memory is reactive to metamemory judgments are discussed.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104506"},"PeriodicalIF":4.3,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139664282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How reliable are standard reading time analyses? Hierarchical bootstrap reveals substantial power over-optimism and scale-dependent Type I error inflation","authors":"Zachary J. Burchill , T. Florian Jaeger","doi":"10.1016/j.jml.2023.104494","DOIUrl":"10.1016/j.jml.2023.104494","url":null,"abstract":"<div><p>We investigate the statistical power and Type I error rate of the two most common approaches to reading time (RT) analyses: assuming normality of residuals and homogeneity of variance in raw or log-transformed RTs. We first show that the assumptions of such analyses—such as <em>t</em><span>-tests, ANOVAs, and linear mixed-effects models—are neither consistently met by raw RTs, nor by log-transformed RTs (or any other common power transforms, incl. inverse-transformed RTs). Only a non-power transform (log-shift) provides a decent fit for all data sets and data preparation steps we consider. We then compare the statistical power and Type I error rate for linear mixed-effects models over raw or log-transformed RTs. Previous studies on this matter relied on parametrically generated data. We show why this is problematic, and introduce as an alternative a hierarchical bootstrap approach over naturally distributed reading times. This approach yields substantially different—and arguably more informative—results than the parametric simulation approaches we compare it to. Our results suggests that it is time to heed the advice others have provided for reading research: for any but the simplest designs, we find both the rate of spurious significances and the rate of undetected true effects can </span><em>strongly</em> depend on the scale (e.g., raw or log-RTs) in which effects are assumed to be linear. Researchers should thus clearly motivate the choice of analysis based on theoretical grounds, assess the robustness of findings under different analysis approaches, and discuss potential mismatches between analyses. The R scripts and libraries shared in the accompanying OSF repo allow researchers to assess the reliability of their analyses via hierarchical bootstrap over their own data.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104494"},"PeriodicalIF":4.3,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruoyu Lu, Zeyu Li, Chenyu Yan, Tengfei Wang, Zhi Li
{"title":"Storage interference in working memory cannot be removed by attention","authors":"Ruoyu Lu, Zeyu Li, Chenyu Yan, Tengfei Wang, Zhi Li","doi":"10.1016/j.jml.2024.104498","DOIUrl":"https://doi.org/10.1016/j.jml.2024.104498","url":null,"abstract":"<div><p>In the present study, we examined the hypothesis that the storage interference in working memory can be removed by attention. A dual-task paradigm was employed in Experiment 1 and 2, in which participants performed a color memory task and an RSVP letter detection task concurrently. The cognitive load of the RSVP letter detection task and the storage interference caused by the RSVP letter detection task was manipulated independently. That is, the produced storage-interference difference between the low and high interference conditions was comparable between the low and high cognitive load conditions, whereas the available attentional resources were different under the two cognitive load conditions. Since there were more attentional resources in the low load condition, the removal hypothesis predicts that differences in recall performance between the low and high interference conditions should be larger in high load than in low load, i.e., there would be an interaction between load and interference. However, the results of the two experiments did not show such an interaction. In Experiment 3, we manipulated the time available for the removal mechanism to work while inducing both the storage interference and processing interference. The results showed no sign of interference removal. Thus, the present results provided solid evidence to challenge the removal hypothesis.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"136 ","pages":"Article 104498"},"PeriodicalIF":4.3,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139433697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What latent variable underlies confidence in lineup rejections?","authors":"Anne S. Yilmaz, John T. Wixted","doi":"10.1016/j.jml.2023.104493","DOIUrl":"10.1016/j.jml.2023.104493","url":null,"abstract":"<div><p>When a face is positively identified from a multi-person photo lineup, it is presumably the face that generates the strongest memory signal. In addition, confidence in a positive identification is presumably determined by the strength of the memory signal associated with that face. However, when no face generates a strong enough memory signal to be identified, the entire set of faces in the lineup is collectively rejected. What latent variable underlies confidence in a lineup rejection? One possibility is that the face that generates the strongest memory signal still determines confidence (i.e., the weaker that memory signal is, the more confidently the lineup is rejected). Another possibility is that confidence in a lineup rejection is determined by the average strength of the memory signals generated by the faces in the lineup (i.e., the weaker that average memory signal is, the more confidently the lineup is rejected). The reliance on an average signal has been proposed as a possible explanation for why the confidence-accuracy for lineup rejections tends to be weak. Here, we modified two existing signal-detection-based lineup models (the Independent Observations model and the Ensemble model) and fit them to multiple lineup datasets to investigate which decision variable underlies confidence in lineup rejections. Both models agree that confidence in a lineup rejection is based on the strongest memory signal in the lineup, not on the average signal. These model fits also revealed for the first time that the memory signals in a lineup are correlated, as they theoretically should be.</p></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"135 ","pages":"Article 104493"},"PeriodicalIF":4.3,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139067688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}