Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations.

IF 2.8 3区医学 Q2 PHYSIOLOGY

European Journal of Applied Physiology Pub Date : 2025-06-01 Epub Date: 2025-02-13 DOI:10.1007/s00421-025-05720-6

Konstantin Warneke, Thomas Gronwald, Sebastian Wallot, Alessia Magno, Martin Hillebrecht, Klaus Wirth

{"title":"Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations.","authors":"Konstantin Warneke, Thomas Gronwald, Sebastian Wallot, Alessia Magno, Martin Hillebrecht, Klaus Wirth","doi":"10.1007/s00421-025-05720-6","DOIUrl":null,"url":null,"abstract":"Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.","PeriodicalId":12005,"journal":{"name":"European Journal of Applied Physiology","volume":" ","pages":"1511-1526"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12174282/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Applied Physiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00421-025-05720-6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PHYSIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (r_p) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent r_p and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.

查看原文本刊更多论文

运动医学和运动科学中常用信度指标效度的探讨：数据模拟综述。

除了客观性和有效性之外，可靠性被认为是科学工作中测试的先决条件，因为不可靠的测试方案限制了结论，特别是在实际应用中。分类指南通常涉及相对信度，重点关注Pearson相关系数（rp）和class内相关系数（ICC）。在这些问题上，除了变异系数（CV）之外，通常还计算测量的标准误差（SEM）和最小可检测变化（MDC）。然而，这些并不能解释系统或随机错误（例如，标准化问题）。为了说明这一点，我们将体育科学中的常见信度统计应用于模拟数据，该数据扩展了来自（青年）精英篮球运动员的两个原始反动作-跳跃会话的样本量。这些结果表明，没有系统偏倚的优秀rp和ICC（≥0.9）伴随着平均绝对百分比误差超过20%。此外，我们表明ICC没有考虑到系统误差，只有有限的准确性价值，这可能导致数据的误导性结论。虽然对数据进行简单的重组可以提高相对可靠性，并有意义地减少一致性的限制，但也发生了系统错误。这个例子强调了常用的基于icc的可靠性统计（SEM， MDC）在量化主要和次要方差源方面缺乏有效性和客观性。在揭示了文献中的几个警告之后（例如，忽略了系统和随机误差，或者没有区分协议和设备可靠性），我们建议一种方法学方法，通过推荐预设可接受的测量误差来提供可靠的数据收集，作为有效结论的先决条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Applied Physiology 医学-生理学

CiteScore

6.00

自引率

6.70%

发文量

227

审稿时长

3 months

期刊介绍： The European Journal of Applied Physiology (EJAP) aims to promote mechanistic advances in human integrative and translational physiology. Physiology is viewed broadly, having overlapping context with related disciplines such as biomechanics, biochemistry, endocrinology, ergonomics, immunology, motor control, and nutrition. EJAP welcomes studies dealing with physical exercise, training and performance. Studies addressing physiological mechanisms are preferred over descriptive studies. Papers dealing with animal models or pathophysiological conditions are not excluded from consideration, but must be clearly relevant to human physiology.