Sample size matters when estimating test-retest reliability of behaviour.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods Pub Date : 2025-03-21 DOI:10.3758/s13428-025-02599-1

Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou

{"title":"Sample size matters when estimating test-retest reliability of behaviour.","authors":"Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou","doi":"10.3758/s13428-025-02599-1","DOIUrl":null,"url":null,"abstract":"<p><p>Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"123"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928395/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02599-1","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.

查看原文本刊更多论文

在估计行为的重测信度时，样本量很重要。

类内相关系数（ICCs）是测试-重测信度研究中常用的度量，用于评估测量方法量化系统受试者间差异的能力。然而，受试者间差异的估计也受到受试者内变异性、随机误差和测量偏差等因素的影响。在这里，我们使用从大型在线样本（N = 150）收集的数据来(1)量化使用ICC的逆转学习行为和计算度量的测试-重测可靠性，(2)使用我们的数据集作为模拟研究的基础，调查样本量对方差成分估计的影响以及方差成分估计与ICC度量之间的关联。与先前发表的工作一致，我们发现了可靠的行为和计算方法的逆转学习，一种常用的行为灵活性分析。受试者间、受试者内（跨时段）的可靠估计，以及行为和计算测量的误差方差成分(±。05精度和80%置信度)需要的样本量范围从10到300以上(行为中位数N：受试者之间= 167，受试者内部= 34，误差= 103；计算中位数N: between-subject = 68, within-subject = 20, error = 45)。这些样本量超过了可靠性研究中经常使用的样本量，这表明需要比通常用于可靠性研究的样本量更大（大约30个）来稳健地估计任务绩效测量的可靠性。此外，我们发现ICC估计分别与主体和误差方差成分呈高度正相关和高度负相关，正如预期的那样，在不同的样本量上保持相对稳定。然而，ICC估计值与主体内方差的相关性较弱或不相关，这为方差分解对可靠性研究的重要性提供了证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.