Retrospective use of the Pragmatic-Explanatory Continuum Indicator Summary-2 trial design tool to assess design choices in randomized controlled trials: an empirical review

IF 5.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology Pub Date : 2025-09-01 DOI:10.1016/j.jclinepi.2025.111959

Andrew Willis , Frances Shiely , Alison H. Howie , Shaun Treweek , Monica Taljaard , Kirsty Loudon , Ellen Murphy , Aarian Bhakoo , Yasaman Yazdani , Frank Ward , Perrine Janiaud , Andrea Haren , Aileen Yining Liang , Clare Robinson , Daisy Deng , Lars Hemkens , Evelyn O'Sullivan Greene , Laura Slattery , Merrick Zwarenstein

{"title":"Retrospective use of the Pragmatic-Explanatory Continuum Indicator Summary-2 trial design tool to assess design choices in randomized controlled trials: an empirical review","authors":"Andrew Willis , Frances Shiely , Alison H. Howie , Shaun Treweek , Monica Taljaard , Kirsty Loudon , Ellen Murphy , Aarian Bhakoo , Yasaman Yazdani , Frank Ward , Perrine Janiaud , Andrea Haren , Aileen Yining Liang , Clare Robinson , Daisy Deng , Lars Hemkens , Evelyn O'Sullivan Greene , Laura Slattery , Merrick Zwarenstein","doi":"10.1016/j.jclinepi.2025.111959","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool has been widely used to help investigators design randomized trials, facilitating the task of aligning design choices with an explanatory or pragmatic primary trial intention. PRECIS-2 is increasingly being used to retrospectively assess the degree of pragmatism or explanatoriness among published trials within reviews. There is little information on the interrater reliability of the tool and no consensus on the preferred method of achieving an accurate and reliable judgment of trial “pragmatism” when using PRECIS-2 retrospectively. The aims of this study were to assess the level of pragmatism or explanatoriness of trials that cite PRECIS-2 and to assess interrater reliability of PRECIS-2 using different scoring approaches. We compared agreement between two independent ratings within a single pair with agreement between consensus scores reached by two independent pairs of reviewers and whether widening the agreement criteria increased interrater reliability.</div></div><div><h3>Methods</h3><div>Thirty randomized controlled trials (RCTs) were randomly selected from trials citing the PRECIS-2 tool. Two pairs of reviewers, a clinician paired with a methodologist in each case, were trained and independently scored each trial and reached a consensus score within pairs. Agreement between reviewers within pairs and between consensus scores across pairs was assessed using kappa statistics for each of the nine PRECIS-2 domains.</div></div><div><h3>Results</h3><div>RCTs citing PRECIS-2 had predominantly pragmatic design features. Interrater reliability within pairs was low across all domains, with the highest levels found in the two domains of analysis (0.32) and follow-up (0.33). Agreement across pairs on the consensus scores was similarly low. Agreement between reviewers and reviewer pairs was above 70% when agreement was reclassified as “within 1-point difference on the scoring scale” for eight domains, but no improvement was obtained for the remaining domain.</div></div><div><h3>Conclusion</h3><div>Trials citing PRECIS-2 tend to have predominantly pragmatic design features. When using PRECIS-2 to retrospectively score trial publications, agreement between consensus scores across pairs of reviewers was no better than agreement within pairs. Reconfiguring the PRECIS scoring scale and improving scoring guidance may provide a more meaningful, easily interpreted measure of “pragmatism” for trialists wishing to use PRECIS-2 as a review tool.</div></div><div><h3>Plain Language Summary</h3><div>The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool is designed to help researchers match their design decisions to the intended purpose of their trial. The intention of a trial can be “explanatory,” which improves our understanding of how an intervention works, or “pragmatic,” which supports decision-making in health care. Increasingly, the tool has been used for a secondary purpose: in systematic reviews. Here the tool is used to judge the level of “pragmatism” or “explanatoriness” of trials included in the review to aid the understanding of trial results. However, there is debate on the most reliable means of making this judgment. Sometimes judgements are made using one reviewer; other times, multiple reviewers. Our study evaluated interrater reliability of two methods of scoring trial publications using PRECIS-2: individual reviewer scores and pairs of reviewers agreeing on a consensus score. We also found that neither method we tested produced a reliable judgment using PRECIS-2, and the scores from two reviewers agreeing on a consensus were no more reliable than scores from a single reviewer. We performed an additional analysis that showed that simplifying the scoring from the original five-point scale to a three-point scale may give a more reliable judgment of the “pragmatism” or “explanatioriness” of published trials. This simpler method of scoring should be encouraged for retrospective use of PRECIS-2 in systematic reviews.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111959"},"PeriodicalIF":5.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625002926","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective

The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool has been widely used to help investigators design randomized trials, facilitating the task of aligning design choices with an explanatory or pragmatic primary trial intention. PRECIS-2 is increasingly being used to retrospectively assess the degree of pragmatism or explanatoriness among published trials within reviews. There is little information on the interrater reliability of the tool and no consensus on the preferred method of achieving an accurate and reliable judgment of trial “pragmatism” when using PRECIS-2 retrospectively. The aims of this study were to assess the level of pragmatism or explanatoriness of trials that cite PRECIS-2 and to assess interrater reliability of PRECIS-2 using different scoring approaches. We compared agreement between two independent ratings within a single pair with agreement between consensus scores reached by two independent pairs of reviewers and whether widening the agreement criteria increased interrater reliability.

Methods

Thirty randomized controlled trials (RCTs) were randomly selected from trials citing the PRECIS-2 tool. Two pairs of reviewers, a clinician paired with a methodologist in each case, were trained and independently scored each trial and reached a consensus score within pairs. Agreement between reviewers within pairs and between consensus scores across pairs was assessed using kappa statistics for each of the nine PRECIS-2 domains.

Results

RCTs citing PRECIS-2 had predominantly pragmatic design features. Interrater reliability within pairs was low across all domains, with the highest levels found in the two domains of analysis (0.32) and follow-up (0.33). Agreement across pairs on the consensus scores was similarly low. Agreement between reviewers and reviewer pairs was above 70% when agreement was reclassified as “within 1-point difference on the scoring scale” for eight domains, but no improvement was obtained for the remaining domain.

Conclusion

Trials citing PRECIS-2 tend to have predominantly pragmatic design features. When using PRECIS-2 to retrospectively score trial publications, agreement between consensus scores across pairs of reviewers was no better than agreement within pairs. Reconfiguring the PRECIS scoring scale and improving scoring guidance may provide a more meaningful, easily interpreted measure of “pragmatism” for trialists wishing to use PRECIS-2 as a review tool.

Plain Language Summary

The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool is designed to help researchers match their design decisions to the intended purpose of their trial. The intention of a trial can be “explanatory,” which improves our understanding of how an intervention works, or “pragmatic,” which supports decision-making in health care. Increasingly, the tool has been used for a secondary purpose: in systematic reviews. Here the tool is used to judge the level of “pragmatism” or “explanatoriness” of trials included in the review to aid the understanding of trial results. However, there is debate on the most reliable means of making this judgment. Sometimes judgements are made using one reviewer; other times, multiple reviewers. Our study evaluated interrater reliability of two methods of scoring trial publications using PRECIS-2: individual reviewer scores and pairs of reviewers agreeing on a consensus score. We also found that neither method we tested produced a reliable judgment using PRECIS-2, and the scores from two reviewers agreeing on a consensus were no more reliable than scores from a single reviewer. We performed an additional analysis that showed that simplifying the scoring from the original five-point scale to a three-point scale may give a more reliable judgment of the “pragmatism” or “explanatioriness” of published trials. This simpler method of scoring should be encouraged for retrospective use of PRECIS-2 in systematic reviews.

查看原文本刊更多论文

回顾性使用PRECIS-2试验设计工具评估随机对照试验的设计选择；实证分析。

语用-解释性连续指标摘要（PRECIS-2）工具已被广泛用于帮助研究者设计随机试验，促进了将设计选择与解释性或语用性主要试验意图相一致的任务。PRECIS-2越来越多地被用于回顾性评估已发表的试验在综述中的实用主义或解释性程度。关于该工具的判读器可靠性的信息很少，并且在回顾性使用PRECIS-2时，对于实现准确可靠的试验“实用主义”判断的首选方法没有达成共识。目的：本研究的目的是评估引用PRECIS-2的试验的实用主义或解释性水平，并使用不同的评分方法评估PRECIS-2的评分者间信度。我们比较了两组独立评价者之间的一致性和两组独立评价者达成的一致评分之间的一致性；以及扩大协议标准是否提高了评级机构之间的可靠性。方法：从引用PRECIS-2工具的试验中随机选取30项随机对照试验。两对审稿人：一名临床医生与一名方法学家在每个病例中配对，经过培训并独立对每个试验进行评分，并在成对中达成共识评分。使用kappa统计对9个PRECIS-2域中的每一个进行评估。结果：RCTsciting PRECIS-2具有主要的实用设计特征。在所有领域中，对之间的信度都很低，在两个领域中发现的信度最高：分析（0.32）和随访（0.33）领域。两个人在共识分数上的一致性同样很低。当8个领域的一致性被重新分类为“在评分量表上相差1分”时，审稿人和审稿人对之间的一致性超过70%，但其余领域没有得到改善。讨论：引用PRECIS-2的试验往往具有主要的实用设计特征。当使用PRECIS-2对试验出版物进行回顾性评分时，对审稿人的共识评分之间的一致性并不比对审稿人内部的一致性更好。重新配置PRECIS评分量表和改进评分指导可以为希望使用PRECIS-2作为审查工具的试用者提供更有意义、更容易解释的“实用主义”衡量标准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.