Konstantin Warneke, Thomas Gronwald, Sebastian Wallot, Alessia Magno, Martin Hillebrecht, Klaus Wirth
{"title":"Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations.","authors":"Konstantin Warneke, Thomas Gronwald, Sebastian Wallot, Alessia Magno, Martin Hillebrecht, Klaus Wirth","doi":"10.1007/s00421-025-05720-6","DOIUrl":null,"url":null,"abstract":"<p><p>Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (r<sub>p</sub>) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent r<sub>p</sub> and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.</p>","PeriodicalId":12005,"journal":{"name":"European Journal of Applied Physiology","volume":" ","pages":"1511-1526"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12174282/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Applied Physiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00421-025-05720-6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PHYSIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.
期刊介绍:
The European Journal of Applied Physiology (EJAP) aims to promote mechanistic advances in human integrative and translational physiology. Physiology is viewed broadly, having overlapping context with related disciplines such as biomechanics, biochemistry, endocrinology, ergonomics, immunology, motor control, and nutrition. EJAP welcomes studies dealing with physical exercise, training and performance. Studies addressing physiological mechanisms are preferred over descriptive studies. Papers dealing with animal models or pathophysiological conditions are not excluded from consideration, but must be clearly relevant to human physiology.