A comparison of alternative ranking methods in two-stage clinical trials with multiple interventions: An application to the anxiolysis for laceration repair in children trial.

IF 2.2 3区医学 Q3 MEDICINE, RESEARCH & EXPERIMENTAL

Clinical Trials Pub Date : 2024-05-21 DOI:10.1177/17407745241251812

Nam-Anh Tran, Abigail McGrory, Naveen Poonai, Anna Heath

{"title":"A comparison of alternative ranking methods in two-stage clinical trials with multiple interventions: An application to the anxiolysis for laceration repair in children trial.","authors":"Nam-Anh Tran, Abigail McGrory, Naveen Poonai, Anna Heath","doi":"10.1177/17407745241251812","DOIUrl":null,"url":null,"abstract":"Background/aims: Multi-arm, multi-stage trials frequently include a standard care to which all interventions are compared. This may increase costs and hinders comparisons among the experimental arms. Furthermore, the standard care may not be evident, particularly when there is a large variation in standard practice. Thus, we aimed to develop an adaptive clinical trial that drops ineffective interventions following an interim analysis before selecting the best intervention at the final stage without requiring a standard care.Methods: We used Bayesian methods to develop a multi-arm, two-stage adaptive trial and evaluated two different methods for ranking interventions, the probability that each intervention was optimal (Pbest) and using the surface under the cumulative ranking curve (SUCRA), at both the interim and final analysis. The proposed trial design determines the maximum sample size for each intervention using the Average Length Criteria. The interim analysis takes place at approximately half the pre-specified maximum sample size and aims to drop interventions for futility if either Pbest or the SUCRA is below a pre-specified threshold. The final analysis compares all remaining interventions at the maximum sample size to conclude superiority based on either Pbest or the SUCRA. The two ranking methods were compared across 12 scenarios that vary the number of interventions and the assumed differences between the interventions. The thresholds for futility and superiority were chosen to control type 1 error, and then the predictive power and expected sample size were evaluated across scenarios. A trial comparing three interventions that aim to reduce anxiety for children undergoing a laceration repair in the emergency department was then designed, known as the Anxiolysis for Laceration Repair in Children Trial (ALICE) trial.Results: As the number of interventions increases, the SUCRA results in a higher predictive power compared with Pbest. Using Pbest results in a lower expected sample size when there is an effective intervention. Using the Average Length Criterion, the ALICE trial has a maximum sample size for each arm of 100 patients. This sample size results in a 86% and 85% predictive power using Pbest and the SUCRA, respectively. Thus, we chose Pbest as the ranking method for the ALICE trial.Conclusion: Bayesian ranking methods can be used in multi-arm, multi-stage trials with no clear control intervention. When more interventions are included, the SUCRA results in a higher power than Pbest. Future work should consider whether other ranking methods may also be relevant for clinical trial design.","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745241251812"},"PeriodicalIF":2.2000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528845/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Trials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/17407745241251812","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background/aims: Multi-arm, multi-stage trials frequently include a standard care to which all interventions are compared. This may increase costs and hinders comparisons among the experimental arms. Furthermore, the standard care may not be evident, particularly when there is a large variation in standard practice. Thus, we aimed to develop an adaptive clinical trial that drops ineffective interventions following an interim analysis before selecting the best intervention at the final stage without requiring a standard care.

Methods: We used Bayesian methods to develop a multi-arm, two-stage adaptive trial and evaluated two different methods for ranking interventions, the probability that each intervention was optimal (P_best) and using the surface under the cumulative ranking curve (SUCRA), at both the interim and final analysis. The proposed trial design determines the maximum sample size for each intervention using the Average Length Criteria. The interim analysis takes place at approximately half the pre-specified maximum sample size and aims to drop interventions for futility if either P_best or the SUCRA is below a pre-specified threshold. The final analysis compares all remaining interventions at the maximum sample size to conclude superiority based on either P_best or the SUCRA. The two ranking methods were compared across 12 scenarios that vary the number of interventions and the assumed differences between the interventions. The thresholds for futility and superiority were chosen to control type 1 error, and then the predictive power and expected sample size were evaluated across scenarios. A trial comparing three interventions that aim to reduce anxiety for children undergoing a laceration repair in the emergency department was then designed, known as the Anxiolysis for Laceration Repair in Children Trial (ALICE) trial.

Results: As the number of interventions increases, the SUCRA results in a higher predictive power compared with P_best. Using P_best results in a lower expected sample size when there is an effective intervention. Using the Average Length Criterion, the ALICE trial has a maximum sample size for each arm of 100 patients. This sample size results in a 86% and 85% predictive power using P_best and the SUCRA, respectively. Thus, we chose P_best as the ranking method for the ALICE trial.

Conclusion: Bayesian ranking methods can be used in multi-arm, multi-stage trials with no clear control intervention. When more interventions are included, the SUCRA results in a higher power than P_best. Future work should consider whether other ranking methods may also be relevant for clinical trial design.

查看原文本刊更多论文

比较具有多种干预措施的两阶段临床试验中的其他排序方法：应用于儿童裂伤修复抗焦虑试验。

背景/目的：多臂、多阶段试验通常包括一种标准护理，所有干预措施都要与之进行比较。这可能会增加成本，并阻碍各试验组之间的比较。此外，标准治疗可能并不明显，尤其是在标准实践差异较大的情况下。因此，我们旨在开发一种适应性临床试验，在进行中期分析后放弃无效干预措施，然后在最后阶段选择最佳干预措施，而不需要标准疗法：我们使用贝叶斯方法开发了一种多臂、两阶段适应性试验，并在中期和最终分析中评估了两种不同的干预措施排序方法，即每种干预措施为最佳的概率（Pbest）和使用累积排序曲线下表面（SUCRA）。拟议的试验设计使用平均长度标准确定每种干预措施的最大样本量。中期分析的样本量约为预先规定的最大样本量的一半，目的是在 Pbest 或 SUCRA 低于预先规定的阈值时以无效为由放弃干预。最终分析在最大样本量下对所有剩余干预措施进行比较，根据 Pbest 或 SUCRA 得出优越性结论。这两种排序方法在 12 种情况下进行了比较，这些情况下干预措施的数量和干预措施之间的假定差异各不相同。选择无效性和优越性的阈值是为了控制1型误差，然后在不同情况下评估预测能力和预期样本量。然后设计了一项试验，即儿童裂伤修复抗焦虑试验（ALICE），旨在比较三种干预措施，以减轻急诊科接受裂伤修复的儿童的焦虑：结果：随着干预措施数量的增加，SUCRA 的预测能力高于 Pbest。当存在有效干预时，使用 Pbest 会导致预期样本量降低。使用平均长度标准，ALICE 试验每个臂的最大样本量为 100 名患者。使用 Pbest 和 SUCRA 的预测能力分别为 86% 和 85%。因此，我们选择 Pbest 作为 ALICE 试验的排序方法：结论：贝叶斯排序法可用于无明确对照干预的多臂、多阶段试验。结论：贝叶斯排序法可用于无明确对照干预措施的多臂、多阶段试验。当纳入更多干预措施时，SUCRA 的结果比 Pbest 更有说服力。未来的工作应考虑其他排序方法是否也适用于临床试验设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Trials 医学-医学：研究与实验

CiteScore

4.10

自引率

3.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Clinical Trials is dedicated to advancing knowledge on the design and conduct of clinical trials related research methodologies. Covering the design, conduct, analysis, synthesis and evaluation of key methodologies, the journal remains on the cusp of the latest topics, including ethics, regulation and policy impact.