{"title":"Implications and Recommendations for Equivalence Testing in Measures of Movement Behaviors: A Scoping Review","authors":"M. O'Brien","doi":"10.1123/jmpb.2021-0021","DOIUrl":null,"url":null,"abstract":"Equivalence testing may provide complementary information to more frequently used statistical procedures because it determines whether physical behavior outcomes are statistically equivalent to criterion measures. A caveat of this procedure is the predetermined selection of upper and lower bounds of acceptable error around a specified zone of equivalence. With no clear guidelines available to assist researchers, these equivalence zones are arbitrarily selected. A scoping review of articles implementing equivalence testing was performed to determine the validity of physical behavior outcomes; the aim was to characterize how this procedure has been implemented and to provide recommendations. A literature search from five databases initially identified potentially 1,153 articles which resulted in the acceptance of 19 studies (20 arms) conducted in children/youth and 40 in adults (49 arms). Most studies were conducted in free-living conditions (children/youth = 13 arms; adults = 22 arms) and employed a ±10% equivalence zone. However, equivalence zones ranged from ±3% to ±25% with only a subset using absolute thresholds (e.g., ±1,000 steps/day). If these equivalence zones were increased or decreased by ±5%, 75% (15/20, children/youth) and 71% (35/49, adults), they would have exhibited opposing equivalence test outcomes (i.e., equivalent to nonequivalent or vice versa). This scoping review identifies the heterogeneous usage of equivalence testing in studies examining the accuracy of (in)activity measures. In the absence of evidence-based standardized equivalence criteria, presenting the percentage required to achieve statistical equivalence or using absolute thresholds as a proportion of the SD may be a better practice than arbitrarily selecting zones a priori.","PeriodicalId":73572,"journal":{"name":"Journal for the measurement of physical behaviour","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal for the measurement of physical behaviour","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1123/jmpb.2021-0021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Equivalence testing may provide complementary information to more frequently used statistical procedures because it determines whether physical behavior outcomes are statistically equivalent to criterion measures. A caveat of this procedure is the predetermined selection of upper and lower bounds of acceptable error around a specified zone of equivalence. With no clear guidelines available to assist researchers, these equivalence zones are arbitrarily selected. A scoping review of articles implementing equivalence testing was performed to determine the validity of physical behavior outcomes; the aim was to characterize how this procedure has been implemented and to provide recommendations. A literature search from five databases initially identified potentially 1,153 articles which resulted in the acceptance of 19 studies (20 arms) conducted in children/youth and 40 in adults (49 arms). Most studies were conducted in free-living conditions (children/youth = 13 arms; adults = 22 arms) and employed a ±10% equivalence zone. However, equivalence zones ranged from ±3% to ±25% with only a subset using absolute thresholds (e.g., ±1,000 steps/day). If these equivalence zones were increased or decreased by ±5%, 75% (15/20, children/youth) and 71% (35/49, adults), they would have exhibited opposing equivalence test outcomes (i.e., equivalent to nonequivalent or vice versa). This scoping review identifies the heterogeneous usage of equivalence testing in studies examining the accuracy of (in)activity measures. In the absence of evidence-based standardized equivalence criteria, presenting the percentage required to achieve statistical equivalence or using absolute thresholds as a proportion of the SD may be a better practice than arbitrarily selecting zones a priori.