Yong Zhang, Anja F. Ernst, Ginette Lafit, Ward B. Eiling, Laura F. Bringmann
{"title":"An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models","authors":"Yong Zhang, Anja F. Ernst, Ginette Lafit, Ward B. Eiling, Laura F. Bringmann","doi":"10.1111/bmsp.70012","DOIUrl":"10.1111/bmsp.70012","url":null,"abstract":"<p>The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 2","pages":"409-436"},"PeriodicalIF":1.8,"publicationDate":"2026-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13067993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fridtjof Petersen, Jonas M. B. Haslbeck, Jorge N. Tendeiro, Anna M. Langener, Martien J. H. Kas, Dimitris Rizopoulos, Laura F. Bringmann
{"title":"Comparing training window selection methods for prediction in non-stationary time series","authors":"Fridtjof Petersen, Jonas M. B. Haslbeck, Jorge N. Tendeiro, Anna M. Langener, Martien J. H. Kas, Dimitris Rizopoulos, Laura F. Bringmann","doi":"10.1111/bmsp.70018","DOIUrl":"10.1111/bmsp.70018","url":null,"abstract":"<p>The widespread adoption of smartphones creates the possibility to passively monitor everyday behaviour via sensors. Sensor data have been linked to moment-to-moment psychological symptoms and mood of individuals and thus could alleviate the burden associated with repeated measurement of symptoms. Additionally, psychological care could be improved by predicting moments of high psychopathology and providing immediate interventions. Current research assumes that the relationship between sensor data and psychological symptoms is constant over time – or changes with a fixed rate: Models are trained on all past data or on a fixed window, without comparing different window sizes with each other. This is problematic as choosing the wrong training window can negatively impact prediction accuracy, especially if the underlying rate of change is varying. As a potential solution we compare different methodologies for choosing the correct window size ranging from frequent practice based on heuristics to super learning approaches. In a simulation study, we vary the rate of change in the underlying relationship form over time. We show that even computing a simple average across different windows can help reduce the prediction error rather than selecting a single best window for both simulated and real world data.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 2","pages":"341-361"},"PeriodicalIF":1.8,"publicationDate":"2026-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13067991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Zitzmann, Christoph Lindner, Julian F. Lohmann, Martin Hecht
{"title":"A novel nonvisual procedure for screening for nonstationarity in time series as obtained from intensive longitudinal designs","authors":"Steffen Zitzmann, Christoph Lindner, Julian F. Lohmann, Martin Hecht","doi":"10.1111/bmsp.12394","DOIUrl":"10.1111/bmsp.12394","url":null,"abstract":"<p>Researchers working with intensive longitudinal designs often encounter the challenge of determining whether to relax the assumption of stationarity in their models. Given that these designs typically involve data from a large number of subjects (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>N</mi>\u0000 <mo>≫</mo>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </semantics></math>), visual screening all time series can quickly become tedious. Even when conducted by experts, such screenings can lack accuracy. In this article, we propose a nonvisual procedure that enables fast and accurate screening. This procedure has potential to become a widely adopted approach for detecting nonstationarity and guiding model building in psychology and related fields, where intensive longitudinal designs are used and time series data are collected.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 2","pages":"437-452"},"PeriodicalIF":1.8,"publicationDate":"2026-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LLM-based prior elicitation for Bayesian graphical modeling.","authors":"Nikola Sekulovski, Meike Waaijers, Giuseppe Arena","doi":"10.1111/bmsp.70045","DOIUrl":"https://doi.org/10.1111/bmsp.70045","url":null,"abstract":"<p><p>In the Bayesian graphical modeling framework, priors on network structure encode theoretical assumptions and uncertainty about the topology of psychological constructs under study. For instance, the Bernoulli prior specifies the probability of each pairwise interaction, the Beta-Bernoulli prior governs expected network density, and the Stochastic Block prior models clustering. In practice, however, specifying informed hyperparameters is challenging: theoretical guidance is limited, and default choices can be overly simplistic or restrictive. To address this, we introduce an LLM-based prior elicitation framework in which a large language model provides inclusion judgments for each variable pair. These judgments are converted into edge-specific prior probabilities for the Bernoulli prior and used to derive hyperparameters for the Beta-Bernoulli and Stochastic Block priors. To make the approach accessible, we provide an R package, bgmElicit, with a Shiny app implementing the methodology. We illustrate the framework in two examples. First, a validation on a subset of a PTSD network from a meta-analysis compares OpenAI GPT models across several conditions. Second, an empirical analysis of 17 PTSD symptoms shows that elicited priors can modestly strengthen evidence regarding edge presence and absence. Taken together, this work is a proof of concept, complementary to expert judgment and prior sensitivity checks.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147576609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To vary or not to vary: A flexible empirical Bayes factor for testing variance components.","authors":"Fabio Vieira, Hongwei Zhao, Joris Mulder","doi":"10.1111/bmsp.70048","DOIUrl":"https://doi.org/10.1111/bmsp.70048","url":null,"abstract":"<p><p>Random effects are the gold standard for capturing structural heterogeneity, such as individual differences or temporal dependence. Yet testing their presence is difficult because variance components are constrained to be non-negative, creating a boundary problem. This paper introduces a flexible empirical Bayes factor (EBF) for testing random effects. Instead of testing whether a variance component equals zero, the EBF evaluates the equivalent hypothesis that all random effects are zero. The approach avoids manual prior specification: the distribution of the random effects is modeled at the lower level and estimated directly from the data, yielding an \"empirical\" Bayes factor. Using a Savage-Dickey density ratio, the EBF requires only the full model fit, eliminating the need to estimate multiple models with alternative random-effects structures. The method enables testing a single random effect as well as multiple, potentially correlated, random effects simultaneously. Simulation studies examine the operating characteristics of the criterion. To illustrate its breadth, the EBF is applied to several widely used models in psychological research and related fields, including generalized linear crossed mixed-effects models, spatial random-effects models, dynamic structural equation models, random-intercept cross-lagged panel models, and nonlinear mixed-effects models.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147534642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic standard errors for reliability coefficients in item response theory.","authors":"Youjin Sung, Yang Liu","doi":"10.1111/bmsp.70047","DOIUrl":"10.1111/bmsp.70047","url":null,"abstract":"<p><p>In a recent review, Liu et al. (Psychological Methods, 2025b) classified reliability coefficients into two types: classical test theory (CTT) reliability and proportional reduction in mean squared error (PRMSE). This article focuses on quantifying the sampling variability of these coefficients under item response theory (IRT) models. While some existing standard error (SE) formulas are accurate when variability arises only from item parameter estimation, the reliability estimators considered in our work involve additional variability from substituting population moments with sample moments. We propose a general strategy to derive SEs that incorporates both sources of sampling error simultaneously, enabling the estimation of model-based reliability coefficients and their SEs in such settings. We then apply our general theory to derive SEs for two specific estimators under the graded response model: (1) CTT reliability for the expected a posteriori score of the latent variable and (2) PRMSE for the latent variable. Simulation results show that the derived SEs accurately capture the sampling variability across various test lengths in moderate to large samples. We conclude with an empirical illustration and directions for future research.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Nestler, Oliver Lüdtke, Alexander Robitzsch
{"title":"Estimating the reliability of round-robin judgments with social relations confirmatory factor analyses.","authors":"Steffen Nestler, Oliver Lüdtke, Alexander Robitzsch","doi":"10.1111/bmsp.70043","DOIUrl":"https://doi.org/10.1111/bmsp.70043","url":null,"abstract":"<p><p>The social relations model (SRM) is commonly used in psychological research to analyse interdependent data from round-robin designs, where all members of a group rate each other. Based on the recently suggested social relations confirmatory factor analysis (SR-CFA), we present general formulas for determining the reliability of composites of round-robin judgments and also derive simpler variants when specific restrictions are applied to the parameters of the SR-CFA model. In the unidimensional case, this results in an omega-type reliability measure, which can be converted into an alpha-type reliability measure through further restrictions. We also discuss how standard errors of the reliability coefficients can be obtained, illustrate the suggested methods using an empirical example, and we examine the suitability of the estimation approach in a small simulation study. Finally, we discuss questions for future methodological research and how the person-level composites are related to other SRM effect estimates proposed in the SRM literature.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduardo S B de Oliveira, Xiaojing Wang, Jorge L Bazán, Jimmy de la Torre
{"title":"A cognitive diagnosis model for latent classification of bounded continuous variables.","authors":"Eduardo S B de Oliveira, Xiaojing Wang, Jorge L Bazán, Jimmy de la Torre","doi":"10.1111/bmsp.70044","DOIUrl":"https://doi.org/10.1111/bmsp.70044","url":null,"abstract":"<p><p>Cognitive Diagnosis Models (CDMs) are widely used in latent-variable modeling for classification tasks that diagnose abilities or skills. Originally developed for dichotomous indicators, CDMs have been extended to polytomous and continuous responses, including bounded continuous variables (e.g. proportions or index scores on a 0-1 or 0-100 scale). We introduce a Bounded DINA (B-DINA) model, an extension of DINA for handling bounded continuous responses, using a Beta distribution with an appropriate mean-precision parameterization. We present a Bayesian estimation framework, define interpretable item parameters and compute posterior probabilities of membership in each latent-attribute profile. We explicitly address label-switching nonidentifiability and assess absolute model fit via posterior predictive <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> -values (PPP). Also, we have conducted a simulation study to evaluate parameter recovery for our proposed method and its performance. Further, we illustrate the model mainly with municipal data from Southeastern Brazil, where bounded indices summarize economy, education and health. Our proposed B-DINA effectively classifies municipalities and reveals relationships between observed indicators and latent attributes. As bounded continuous variables are common across the social sciences and policy analysis, our proposed B-DINA could offer a broadly applicable classification tool in the practice.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147492082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey N Rouder, Mahbod Mehrvarz, Martin Schnuerch
{"title":"The role of reliability in experiments.","authors":"Jeffrey N Rouder, Mahbod Mehrvarz, Martin Schnuerch","doi":"10.1111/bmsp.70042","DOIUrl":"https://doi.org/10.1111/bmsp.70042","url":null,"abstract":"<p><p>We are concerned about an emphasis on reliability for analysis of psychology experiments. Experiments have two elements of sample size: the number of individuals and the number of replicate trials within a task, and that complicates reliability measures. To account for these elements, we distinguish among three levels of analysis: (1) A foundational level that centers task properties without recourse to either element of sample size. An example statistic is intraclass correlation which is the proportion of variances without reference to sample sizes. (2) An intermediate level that centers the number of trials but not the number of individuals. An example statistic on this level is reliability which describes variabilities with reference to numbers of trials but not numbers of individuals. A final level centers both the numbers of individuals and trials. An example quantity is the uncertainty in a correlation coefficient, which, ideally, reflects sample size limits in individuals and trials. Reliability describes an intermediate level - neither useful for communicating foundational task properties nor interpreting correlations. We advocate that researchers consider all three levels and highlight the role of hierarchical models in doing so.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147463535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using multilabel classification neural network to detect intersectional DIF with small sample sizes.","authors":"Yale Quan, Chun Wang","doi":"10.1111/bmsp.70041","DOIUrl":"https://doi.org/10.1111/bmsp.70041","url":null,"abstract":"<p><p>This study introduces InterDIFNet, a multilabel classification neural network for detecting intersectional differential item functioning (DIF) in educational and psychological assessments, with a focus on small sample sizes. Unlike traditional marginal DIF methods, which often fail to capture the effects of intersecting identities and require large samples, InterDIFNet models uniform and non-uniform DIF across multiple intersectional groups simultaneously and utilizes an optimized thresholding procedure to balance power and Type 1 error control. A Monte Carlo simulation compared InterDIFNet to the Truncated Lasso Penalty (TLP) test and other intersectional DIF methods across varying sample sizes, numbers of groups and DIF prevalence rates. Results show that when trained using TLP features, InterDIFNet consistently achieved higher power than TLP while maintaining comparable Type 1 error control, particularly in scenarios with three or more intersectional groups. An empirical application to real assessment data further demonstrated the method's practical utility. InterDIFNet provides a scalable, data-driven solution for identifying intersectional DIF across multiple small sample groups.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147461050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}