Lena S Bolliger, Patrick Haller, Isabelle C R Cretton, David R Reich, Tannon Kew, Lena A Jäger
{"title":"EMTeC: A corpus of eye movements on machine-generated texts.","authors":"Lena S Bolliger, Patrick Haller, Isabelle C R Cretton, David R Reich, Tannon Kew, Lena A Jäger","doi":"10.3758/s13428-025-02677-4","DOIUrl":"10.3758/s13428-025-02677-4","url":null,"abstract":"<p><p>The Eye movements on Machine-generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text-type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures. It further provides both the original and a corrected version of the fixation sequences, accounting for vertical calibration drift. Moreover, the corpus includes the language models' internals that underlie the generation of the stimulus texts: the transition scores, the attention scores, and the hidden states. The stimuli are annotated for a range of linguistic features both at text and at word level. We anticipate EMTeC to be utilized for a variety of use cases such as, but not restricted to, the investigation of reading behavior on machine-generated text and the impact of different decoding strategies; reading behavior on different text types; the development of new pre-processing, data filtering, and drift correction algorithms; the cognitive interpretability and enhancement of language models; and the assessment of the predictive power of surprisal and entropy for human reading times. The data at all stages of pre-processing, the model internals, and the code to reproduce the stimulus generation, data pre-processing, and analyses can be accessed via https://github.com/DiLi-Lab/EMTeC/ .</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"189"},"PeriodicalIF":4.6,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12134054/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the effect of transformer encoder architecture to improve the reliability of classroom observation ratings on high-inference discourse.","authors":"Jinnie Shin, Wallace N Pinto, Bowen Wang","doi":"10.3758/s13428-025-02711-5","DOIUrl":"https://doi.org/10.3758/s13428-025-02711-5","url":null,"abstract":"<p><p>This study investigates the effect of transformer encoder architecture on the classification accuracy of high-inference discourse elements in classroom settings. Recognizing the importance of capturing nuanced interactions between students and teachers, our study explores the performance of different transformer models, focusing particularly on the bi-encoder architecture of S-BERT. We evaluated various embedding strategies, along with different pooling methods, to optimize the bi-encoder model's classification accuracy for discourse elements such as High Uptake and Focusing Question. We compared S-BERT's performance with traditional cross-encoding transformer models such as BERT and RoBERTa. Our results demonstrate that S-BERT, particularly with a batch size of 8, learning rate of 2e-5, and specific embedding strategies, significantly outperforms other baseline models, achieving F1 scores up to 0.826 for High Uptake and 0.908 for Focusing Question. Our findings highlighted the importance of customized vectorization strategies, encompassing individual and interaction features (dot-product and absolute distance), and underscores the need to carefully select pooling methods to enhance performance. Our findings offer valuable insights into the design of transformer models for classroom discourse analysis, contributing to the advancement of NLP methods in educational research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"186"},"PeriodicalIF":4.6,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How efficient is translation in language testing? Deriving valid student vocabulary tests in Spanish (StuVoc1-Esp and StuVoc2-Esp) from established English tests.","authors":"Beatriz Bermúdez-Margaretto, Marc Brysbaert","doi":"10.3758/s13428-025-02708-0","DOIUrl":"10.3758/s13428-025-02708-0","url":null,"abstract":"<p><p>This study examined the efficiency of item translation in a challenging language-testing situation. We created a Spanish translation of recently developed English vocabulary tests to assess word knowledge in Spanish-speaking students and highly educated adults, a group for whom it is a challenge to find words that some people know and others do not. The English tests were multiple-choice tests based on meaning recognition and consisted of a total of 150 items. From these, we were able to create two Spanish tests with 37 questions each. We constructed and validated the tests in two separate studies, including another established vocabulary test (Lextale-Esp, based on form recognition), general knowledge tests, and a test for reading comprehension. Two online studies with 161 and 196 participants confirmed that both vocabulary tests have reliability above .75 (.86 when combined) and correlate more highly with general knowledge and reading comprehension than Lextale-Esp. This shows that test translation is an efficient way to find useful items for language tests in different languages. All materials (including the general knowledge tests and the reading comprehension test) are freely available for research purposes.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"183"},"PeriodicalIF":4.6,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C Costa, R Pezzetta, E Toffalini, M Grassi, G Cona, C Miniussi, P J Bauer, S Borgomaneri, M Brysbaert, C D Chambers, N Edelstyn, A Eerland, S J Gilbert, M A Nitsche, R A Poldrack, A Puce, K R Ridderinkhof, T Y Swaab, C Umiltà, M Wiener, C Scarpazza
{"title":"Enhancing the quality and reproducibility of research: Preferred Evaluation of Cognitive and Neuropsychological Studies - The PECANS statement for human studies.","authors":"C Costa, R Pezzetta, E Toffalini, M Grassi, G Cona, C Miniussi, P J Bauer, S Borgomaneri, M Brysbaert, C D Chambers, N Edelstyn, A Eerland, S J Gilbert, M A Nitsche, R A Poldrack, A Puce, K R Ridderinkhof, T Y Swaab, C Umiltà, M Wiener, C Scarpazza","doi":"10.3758/s13428-025-02705-3","DOIUrl":"10.3758/s13428-025-02705-3","url":null,"abstract":"<p><p>Are scientific papers providing all essential details necessary to ensure the replicability of study protocols? Are authors effectively conveying study design, data analysis, and the process of drawing inferences from their results? These represent only a fraction of the pressing questions that cognitive psychology and neuropsychology face in addressing the \"crisis of confidence.\" This crisis has highlighted numerous shortcomings in the journey from research to publication. To address these shortcomings, we introduce PECANS (Preferred Evaluation of Cognitive And Neuropsychological Studies), a comprehensive checklist tool designed to guide the planning, execution, evaluation, and reporting of experimental research. PECANS emerged from a rigorous consensus-building process through the Delphi method. We convened a panel of international experts specialized in cognitive psychology and neuropsychology research practices. Through two rounds of iterative voting and a proof-of-concept phase, PECANS evolved into its final form. The PECANS checklist is intended to serve various stakeholders in the fields of cognitive sciences and neuropsychology, including: (i) researchers seeking to ensure and enhance reproducibility and rigor in their research; (ii) journal editors and reviewers assessing the quality of reports; (iii) ethics committees and funding agencies; (iv) students approaching methodology and scientific writing. PECANS is a versatile tool intended not only to improve the quality and transparency of individual research projects but also to foster a broader culture of rigorous scientific inquiry across the academic and research community.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"182"},"PeriodicalIF":4.6,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Validation of the Russian version of the realistic moral vignettes for studies of moral judgments.","authors":"Zorina Rakhmankulova, Rustam Asgarov, Eliana Monahhova, Semyon Mening, Isak B Blank, Vasily Klucharev","doi":"10.3758/s13428-025-02709-z","DOIUrl":"https://doi.org/10.3758/s13428-025-02709-z","url":null,"abstract":"<p><p>Moral judgments and behavior are shaped by individual experiences and cultural environments. In two online studies, we used a standard set of moral vignettes to examine the generalizability of factor structure of moral judgments originally identified in American samples (Knutson et al. Social Cognitive and Affective Neuroscience, 5, 378-384, 2010; Kruepke et al. Behavior Research Methods, 50, 922-936, 2018) by testing two independent samples of the Russian population (Study 1, N = 247; Study 2, N = 223). In Study 1, the exploratory factor analysis revealed three components that accounted for most of the variance: norm violation, social affect, and intention. In Study 2, the factor structure of the identified moral components was validated by confirmatory factor analysis. Latent profile analysis revealed five distinct profiles of moral scenarios: Peccadillo, Illegal-Antisocial, Controversial Act, Prosocial, and a novel profile specific to our Russian samples - Social Conflict - as compared to the previous study of the American population. These findings suggest fundamental similarities in moral judgment processes across cultures while also highlighting culture-specific patterns in moral scenario categorization. This study also provides researchers with a battery of real-life experience-derived vignettes that can be used in cross-cultural studies of moral judgment.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"184"},"PeriodicalIF":4.6,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognition ratings for 8826 english words.","authors":"Hannah T Corenblum, Penny M Pexman","doi":"10.3758/s13428-025-02663-w","DOIUrl":"https://doi.org/10.3758/s13428-025-02663-w","url":null,"abstract":"<p><p>Many abstract words refer to internal cognitive events or states, such as thinking or believing, or to cognitive products, such as theories, ideas, or whims (Binder et al., Cognitive Neuropsychology, 33, 130-174, 2016). Mental state information is proposed to be an important component in the grounding of abstract meaning (Kiefer et al., 2022, Muraki et al., 2022), such that our inner cognitive experiences form a foundational aspect of semantic representation. We tested this proposal by first collecting cognition ratings for over 8000 English words. Then, we used the norms generated from our ratings to examine the unique variance explained by cognition ratings in performance on lexical-semantic tasks. We found a significant effect of cognition, such that there was a facilitative relationship between cognition ratings and behavioral responses, even when controlling for other key lexical and semantic variables. Specifically, words rated as more cognitive in nature elicited faster and more accurate task responses, especially for words with more abstract meanings. This study highlights a novel behavioral effect that is consistent with a multidimensional account of semantic representation.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"181"},"PeriodicalIF":4.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144180679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximin criterion for item selection in computerized adaptive testing.","authors":"Jyun-Hong Chen, Hsiu-Yi Chao","doi":"10.3758/s13428-025-02673-8","DOIUrl":"https://doi.org/10.3758/s13428-025-02673-8","url":null,"abstract":"<p><p>In computerized adaptive testing (CAT), information-based item selection rules (ISRs), such as maximum Fisher information (MFI), often excessively rely on discriminating items, leading to unbalanced utilization of the item pool. To address this challenge, the present study introduced the MaxiMin Information (MMI) criterion, which is grounded in decision theory. MMI calculates each item's minimum information (I<sub>min</sub>) within the current confidence interval (CI) of the trait level, selecting the item with the maximum I<sub>min</sub> to be administered. For examinees with broader CIs (less precise trait estimates), MMI leans toward administering less discriminating items, which tend to yield larger I<sub>min</sub>. Conversely, for narrower CIs, MMI aligns more closely with MFI by favoring items with higher discrimination. This indicates that MMI's item selection is tailored to each examinee based on his or her provisional trait estimate and its estimation precision. Five simulation studies were conducted to assess MMI's performance in CAT under various conditions. Results demonstrate that although MMI is comparable with other ISRs in terms of trait estimation precision, it excels in balancing item pool utilization. By fine-tuning confidence levels, MMI not only efficiently schedules the use of discriminating items toward the test's later stages to enhance test efficiency but also effectively adapts to different testing scenarios. From these findings, we generally recommend applying MMI with a confidence level of 95% to optimize item pool utilization without compromising trait estimation accuracy. With its evident advantages, MMI holds promise for practical applications, especially for high-stakes tests requiring utmost test efficiency and security.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"180"},"PeriodicalIF":4.6,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research.","authors":"Juul Vossen, Evy Kuijpers, Joeri Hofmans","doi":"10.3758/s13428-025-02698-z","DOIUrl":"https://doi.org/10.3758/s13428-025-02698-z","url":null,"abstract":"<p><p>While qualitative research plays a vital role in understanding complex phenomena, it lends itself poorly to testing formal hypotheses due to its inability to fit statistical models to text data. Approaches that are traditionally used to quantify text data (e.g., content analysis) are generally time-consuming, prone to researcher bias, and neglect a substantial amount of potentially important semantic context. Although novel approaches have been proposed, these typically require large amounts of text data and tend to be inductive in nature. To enable researchers to ask hypothesis-based and open-ended questions from one's text data, the current study proposes a novel retrieval augmented generation (RAG)-based approach (called text embedding similarity analysis, TESA) that transforms a hypothesis into two specific search terms: a population (or sample) and a variable of interest. Using pretrained large language models (LLM), we extract the semantic embedding of the search terms and text data and use cosine similarity to match search terms. This allows hypothesis testing by assessing the alignment between the distribution of similarity scores for a variable of interest with the expectation for the population.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 7","pages":"179"},"PeriodicalIF":4.6,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-part sequential measurement models for distinguishing between symptom presence and symptom severity.","authors":"Scott A Baldwin, Joseph A Olsen","doi":"10.3758/s13428-025-02666-7","DOIUrl":"10.3758/s13428-025-02666-7","url":null,"abstract":"<p><p>Two common aspects of symptom measurement are 1) the occurrence or presence of symptoms, and 2) the intensity or severity of symptoms when they occur. We adopt a latent trait perspective based on item response theory (IRT), using both unidimensional and multidimensional IRT models. We demonstrate how to (a) prepare data for analysis, (b) specify, estimate, and compare models, (c) interpret model parameters, (d) compare scores from models, and (e) visualize analysis results. We develop the relevant sequential IRT model, noting its flexibility, congruence with the theorized data generating process for symptom measures, and its promise for facilitating additional research and practical applications. The sequential model is less frequently used than other IRT models for polytomous data such as the generalized partial credit or graded response models. However, estimation of the sequential model can be readily accomplished with standard latent variable modeling and IRT software for binary indicators that allows constraints on the discrimination parameters. We compare the sequential model to other modeling options. We provide discussion of future research directions.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 6","pages":"178"},"PeriodicalIF":4.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eye-tracking-based hidden Markov modeling for revealing within-item cognitive strategy switching.","authors":"Zhimou Wang, Peida Zhan","doi":"10.3758/s13428-025-02678-3","DOIUrl":"10.3758/s13428-025-02678-3","url":null,"abstract":"<p><p>Identifying cognitive strategies in problem-solving helps researchers understand advanced cognitive processes and their applicable contexts. Current methods typically identify strategies for each item of Raven's Advanced Progressive Matrices, capturing only between-item cognitive strategy switching (CSS). Although within-item CSS is recognized, methods to dynamically identify and reveal it are lacking. This study introduces the concept of an eye movement snippet, a basic unit for studying within-item CSS, along with a new eye-tracking process measure that quantifies the sequence length of alternatives viewed in a snippet. Combined with hidden Markov modeling, we propose a new method for dynamically identifying within-item cognitive strategies and revealing their switching. Using eye-tracking data from a matrix reasoning test, we demonstrate the value of the proposed method through a series of analyses. The results indicate that during problem-solving: (1) participants predominantly used two strategies-constructive matching and response elimination; (2) there is a high probability of switching from constructive matching to response elimination, but not vice versa; (3) more difficult items lead to more frequent strategy switching; (4) frequent strategy switching decreases time spent in the matrix area and on problem-solving planning; (5) frequent strategy switching correlates with incorrect answers for some items; and (6) frequent strategy switching increases total response time. Additionally, within-item CSS showed three distinct patterns as the test progressed, with significant differences in participants' intelligence levels and total test time among the patterns. Overall, the proposed method effectively identifies within-item cognitive strategies and their switching in matrix reasoning tasks.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 6","pages":"175"},"PeriodicalIF":4.6,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144101204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}