{"title":"An improved diagrammatic procedure for interpreting and scoring the Wisconsin Card Sorting Test: An update to Steve Berry's 1996 edition.","authors":"Caitlin A Howlett, G Lorimer Moseley","doi":"10.3758/s13428-024-02499-w","DOIUrl":"10.3758/s13428-024-02499-w","url":null,"abstract":"<p><p>The Wisconsin Card Sorting Test (WCST) is a popular neuropsychological test that is complicated to score and interpret. In an attempt to make scoring of the WCST simpler, Berry (The Clinical Neuropsychologist 10, 117-121, 1996) developed a diagrammatic scoring procedure, particularly to aid scoring of perseverative responses. We identified key limitations of Berry's diagram, including its unnecessary ambiguity and complexity, use of terminology different from that used in the standardized WCST manual, and lack of distinction between perseverative errors and perseverative responses. Our new diagrammatic scoring procedure scores each response one-by-one; we strongly suggest that the diagram is used in conjunction with the 1993 WCST manual. Our new diagrammatic scoring procedure aims to assist novice users in learning how to accurately score the task, prevent scoring errors when using the manual version of the task, and help scorers verify whether other existing computerized versions of the task (apart from the PAR version) conform to the Heaton et al. (1993) scoring method. Our diagrammatic scoring procedure holds promise to be incorporated into any future versions of the WCST manual.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142456953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A tutorial on open-source large language models for behavioral science.","authors":"Zak Hussain, Marcel Binz, Rui Mata, Dirk U Wulff","doi":"10.3758/s13428-024-02455-8","DOIUrl":"10.3758/s13428-024-02455-8","url":null,"abstract":"<p><p>Large language models (LLMs) have the potential to revolutionize behavioral science by accelerating and improving the research cycle, from conceptualization to data analysis. Unlike closed-source solutions, open-source frameworks for LLMs can enable transparency, reproducibility, and adherence to data protection standards, which gives them a crucial advantage for use in behavioral science. To help researchers harness the promise of LLMs, this tutorial offers a primer on the open-source Hugging Face ecosystem and demonstrates several applications that advance conceptual and empirical work in behavioral science, including feature extraction, fine-tuning of models for prediction, and generation of behavioral responses. Executable code is made available at github.com/Zak-Hussain/LLM4BeSci.git . Finally, the tutorial discusses challenges faced by research with (open-source) LLMs related to interpretability and safety and offers a perspective on future research at the intersection of language modeling and behavioral science.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141987375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linking essay-writing tests using many-facet models and neural automated essay scoring.","authors":"Masaki Uto, Kota Aramaki","doi":"10.3758/s13428-024-02485-2","DOIUrl":"10.3758/s13428-024-02485-2","url":null,"abstract":"<p><p>For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David A Ellis, John Towse, Olivia Brown, Alicia Cork, Brittany I Davidson, Sophie Devereux, Joanne Hinds, Matthew Ivory, Sophie Nightingale, Douglas A Parry, Lukasz Piwek, Heather Shaw, Andrea S Towse
{"title":"Assessing computational reproducibility in Behavior Research Methods.","authors":"David A Ellis, John Towse, Olivia Brown, Alicia Cork, Brittany I Davidson, Sophie Devereux, Joanne Hinds, Matthew Ivory, Sophie Nightingale, Douglas A Parry, Lukasz Piwek, Heather Shaw, Andrea S Towse","doi":"10.3758/s13428-024-02501-5","DOIUrl":"10.3758/s13428-024-02501-5","url":null,"abstract":"<p><p>Psychological science has thrived thanks to new methods and innovative practices. Journals, including Behavior Research Methods (BRM), continue to support the dissemination and evaluation of research assets including data, software/hardware, statistical code, and databases of stimuli. However, such research assets rarely allow for computational reproducibility, meaning they are difficult to reuse. Therefore, in this preregistered report, we explore how BRM's authors and BRM structures shape the landscape of functional research assets. Our broad research questions concern: (1) How quickly methods and analytical techniques reported in BRM can be used and developed further by other scientists; (2) Whether functionality has improved following changes to BRM journal policy in support of computational reproducibility; (3) Whether we can disentangle such policy changes from changes in reproducibility over time. We randomly sampled equal numbers of papers (N = 204) published in BRM before and after the implementation of policy changes. Pairs of researchers recorded how long it took to ensure assets (data, software/hardware, statistical code, and materials) were fully operational. They also coded the completeness and reusability of the assets. While improvements were observed in all measures, only changes to completeness were altered significantly following the policy changes (d = .37). The effects varied between different types of research assets, with data sets from surveys/experiments showing the largest improvements in completeness and reusability. Perhaps more importantly, changes to policy do appear to have improved the life span of research products by reducing natural decline. We conclude with a discussion of how, in the future, research and policy might better support computational reproducibility within and beyond psychological science.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandrina Cristia, Lucas Gautheron, Zixing Zhang, Björn Schuller, Camila Scaff, Caroline Rowland, Okko Räsänen, Loann Peurey, Marvin Lavechin, William Havard, Caitlin M Fausey, Margaret Cychosz, Elika Bergelson, Heather Anderson, Najla Al Futaisi, Melanie Soderstrom
{"title":"Establishing the reliability of metrics extracted from long-form recordings using LENA and the ACLEW pipeline.","authors":"Alejandrina Cristia, Lucas Gautheron, Zixing Zhang, Björn Schuller, Camila Scaff, Caroline Rowland, Okko Räsänen, Loann Peurey, Marvin Lavechin, William Havard, Caitlin M Fausey, Margaret Cychosz, Elika Bergelson, Heather Anderson, Najla Al Futaisi, Melanie Soderstrom","doi":"10.3758/s13428-024-02493-2","DOIUrl":"10.3758/s13428-024-02493-2","url":null,"abstract":"<p><p>Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children's language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna M Langener, Björn S Siepe, Mahmoud Elsherif, Koen Niemeijer, Pia K Andresen, Samir Akre, Laura F Bringmann, Zachary D Cohen, Nathaniel R Choukas, Konstantin Drexl, Luisa Fassi, James Green, Tabea Hoffmann, Raj R Jagesar, Martien J H Kas, Sebastian Kurten, Ramona Schoedel, Gert Stulp, Georgia Turner, Nicholas C Jacobson
{"title":"A template and tutorial for preregistering studies using passive smartphone measures.","authors":"Anna M Langener, Björn S Siepe, Mahmoud Elsherif, Koen Niemeijer, Pia K Andresen, Samir Akre, Laura F Bringmann, Zachary D Cohen, Nathaniel R Choukas, Konstantin Drexl, Luisa Fassi, James Green, Tabea Hoffmann, Raj R Jagesar, Martien J H Kas, Sebastian Kurten, Ramona Schoedel, Gert Stulp, Georgia Turner, Nicholas C Jacobson","doi":"10.3758/s13428-024-02474-5","DOIUrl":"10.3758/s13428-024-02474-5","url":null,"abstract":"<p><p>Passive smartphone measures hold significant potential and are increasingly employed in psychological and biomedical research to capture an individual's behavior. These measures involve the near-continuous and unobtrusive collection of data from smartphones without requiring active input from participants. For example, GPS sensors are used to determine the (social) context of a person, and accelerometers to measure movement. However, utilizing passive smartphone measures presents methodological challenges during data collection and analysis. Researchers must make multiple decisions when working with such measures, which can result in different conclusions. Unfortunately, the transparency of these decision-making processes is often lacking. The implementation of open science practices is only beginning to emerge in digital phenotyping studies and varies widely across studies. Well-intentioned researchers may fail to report on some decisions due to the variety of choices that must be made. To address this issue and enhance reproducibility in digital phenotyping studies, we propose the adoption of preregistration as a way forward. Although there have been some attempts to preregister digital phenotyping studies, a template for registering such studies is currently missing. This could be problematic due to the high level of complexity that requires a well-structured template. Therefore, our objective was to develop a preregistration template that is easy to use and understandable for researchers. Additionally, we explain this template and provide resources to assist researchers in making informed decisions regarding data collection, cleaning, and analysis. Overall, we aim to make researchers' choices explicit, enhance transparency, and elevate the standards for studies utilizing passive smartphone measures.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141900815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diederick C Niehorster, Marianne Gullberg, Marcus Nyström
{"title":"Behavioral science labs: How to solve the multi-user problem.","authors":"Diederick C Niehorster, Marianne Gullberg, Marcus Nyström","doi":"10.3758/s13428-024-02467-4","DOIUrl":"10.3758/s13428-024-02467-4","url":null,"abstract":"<p><p>When lab resources are shared among multiple research projects, issues such as experimental integrity, replicability, and data safety become important. Different research projects often need different software and settings that may well conflict with one another, and data collected for one project may not be safeguarded from exposure to researchers from other projects. In this paper we provide an infrastructure design and an open-source tool, labManager, that render multi-user lab facilities in the behavioral sciences accessible to research projects with widely varying needs. The solutions proposed ensure ease of management while simultaneously offering maximum flexibility by providing research projects with fully separated bare metal environments. This solution also ensures that collected data is kept separate, and compliant with relevant ethical standards and regulations such as General Data Protection Regulation (GDPR) legislation. Furthermore, we discuss preconditions for running shared lab facilities and provide practical advice.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactions between latent variables in count regression models.","authors":"Christoph Kiefer, Sarah Wilker, Axel Mayer","doi":"10.3758/s13428-024-02483-4","DOIUrl":"10.3758/s13428-024-02483-4","url":null,"abstract":"<p><p>In psychology and the social sciences, researchers often model count outcome variables accounting for latent predictors and their interactions. Even though neglecting measurement error in such count regression models (e.g., Poisson or negative binomial regression) can have unfavorable consequences like attenuation bias, such analyses are often carried out in the generalized linear model (GLM) framework using fallible covariates such as sum scores. An alternative is count regression models based on structural equation modeling, which allow to specify latent covariates and thereby account for measurement error. However, the issue of how and when to include interactions between latent covariates or between latent and manifest covariates is rarely discussed for count regression models. In this paper, we present a latent variable count regression model (LV-CRM) allowing for latent covariates as well as interactions among both latent and manifest covariates. We conducted three simulation studies, investigating the estimation accuracy of the LV-CRM and comparing it to GLM-based count regression models. Interestingly, we found that even in scenarios with high reliabilities, the regression coefficients from a GLM-based model can be severely biased. In contrast, even for moderate sample sizes, the LV-CRM provided virtually unbiased regression coefficients. Additionally, statistical inferences yielded mixed results for the GLM-based models (i.e., low coverage rates, but acceptable empirical detection rates), but were generally acceptable using the LV-CRM. We provide an applied example from clinical psychology illustrating how the LV-CRM framework can be used to model count regressions with latent interactions.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julien P Irmer, Andreas G Klein, Karin Schermelleh-Engel
{"title":"Estimating power in complex nonlinear structural equation modeling including moderation effects: The powerNLSEM R-package.","authors":"Julien P Irmer, Andreas G Klein, Karin Schermelleh-Engel","doi":"10.3758/s13428-024-02476-3","DOIUrl":"10.3758/s13428-024-02476-3","url":null,"abstract":"<p><p>The model-implied simulation-based power estimation (MSPE) approach is a new general method for power estimation (Irmer et al., 2024). MSPE was developed especially for power estimation of non-linear structural equation models (SEM), but it also can be applied to linear SEM and manifest models using the R package powerNLSEM. After first providing some information about MSPE and the new adaptive algorithm that automatically selects sample sizes for the best prediction of power using simulation, a tutorial on how to conduct the MSPE for quadratic and interaction SEM (QISEM) using the powerNLSEM package is provided. Power estimation is demonstrated for four methods, latent moderated structural equations (LMS), the unconstrained product indicator (UPI), a simple factor score regression (FSR), and a scale regression (SR) approach to QISEM. In two simulation studies, we highlight the performance of the MSPE for all four methods applied to two QISEM with varying complexity and reliability. Further, we justify the settings of the newly developed adaptive search algorithm via performance evaluations using simulation. Overall, the MSPE using the adaptive approach performs well in terms of bias and Type I error rates.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525415/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edgar Erdfelder, Julian Quevedo Pütter, Martin Schnuerch
{"title":"On aggregation invariance of multinomial processing tree models.","authors":"Edgar Erdfelder, Julian Quevedo Pütter, Martin Schnuerch","doi":"10.3758/s13428-024-02497-y","DOIUrl":"10.3758/s13428-024-02497-y","url":null,"abstract":"<p><p>Multinomial processing tree (MPT) models are prominent and frequently used tools to model and measure cognitive processes underlying responses in many experimental paradigms. Although MPT models typically refer to cognitive processes within single individuals, they have often been applied to group data aggregated across individuals. We investigate the conditions under which MPT analyses of aggregate data make sense. After introducing the notions of structural and empirical aggregation invariance of MPT models, we show that any MPT model that holds at the level of single individuals must also hold at the aggregate level when it is both structurally and empirically aggregation invariant. Moreover, group-level parameters of aggregation-invariant MPT models are equivalent to the expected values (i.e., means) of the corresponding individual parameters. To investigate the robustness of MPT results for aggregate data when one or both invariance conditions are violated, we additionally performed a series of simulation studies, systematically manipulating (1) the sample sizes in different trees of the model, (2) model parameterization, (3) means and variances of crucial model parameters, and (4) their correlations with other parameters of the respective MPT model. Overall, our results show that MPT parameter estimates based on aggregate data are trustworthy under rather general conditions, provided that a few preconditions are met.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142456954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}