Yitayeh Belsti , Lisa Moran , Aya Mousa , Helena Teede , Joanne Enticott
{"title":"Evaluation of machine learning and logistic regression-based gestational diabetes prognostic models","authors":"Yitayeh Belsti , Lisa Moran , Aya Mousa , Helena Teede , Joanne Enticott","doi":"10.1016/j.jclinepi.2025.111957","DOIUrl":"10.1016/j.jclinepi.2025.111957","url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to follow best practice by temporally evaluating existing gestational diabetes mellitus (GDM) prediction models, updating them where needed, and comparing the temporal evaluation performance of the machine learning (ML)-based models with that of regression-based models.</div></div><div><h3>Study Design and Setting</h3><div>We utilized new data for the temporal validation dataset with 12,722 singleton pregnancies at the Monash Health Network from 2021 to 2022. The Monash GDM Logistic Regression (LR) model with six categorical variables (version 2) and the Monash GDM ML model (version 3), along with an extended LR GDM model (version 3), each with eight categorical and continuous variables, were evaluated. Model performance was assessed using discrimination and calibration. Decision curve analyses (DCA) were performed to determine the net benefit of models. Recalibration was considered to improve model performance.</div></div><div><h3>Results</h3><div>The development datasets for model versions 2, 3, and the new temporal validation dataset included 21.2%, 22.5%, and 33.5% of pregnant women aged ≥35 years, respectively; 22%, 23.7%, and 24.0% with a body mass index ≥30 kg/m<sup>2</sup>; and GDM prevalence rates of 18%, 21.3%, and 28.6%, respectively. There was similar discrimination performance across the models, with area under the receiver operating characteristic curve (AUC) of 0.72 [95% CI: 0.71, 0.73], 0.73 [95% CI: 0.72, 0.74], and 0.73 [95% CI: 0.73, 0.74] for version 2 and version 3 ML and LR models, respectively. All models exhibited overestimation with calibration slopes of 0.87, 0.99, and 0.87, respectively, which improved with recalibration. DCA showed that all models had better net benefits as compared to treat all and treat none. For all models, some variability has been observed in prediction performance across ethnic groups and parity.</div></div><div><h3>Conclusion</h3><div>Despite significant changes in the background characteristics of the population, we have demonstrated that all models remained robust, especially after recalibration. However, the performance of the original ML model decreased significantly during validation. Dynamic models are better suited to adapt to the temporal changes in baseline characteristics of pregnant women and the resulting calibration drift, as they can incorporate new data without requiring manual evaluation.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111957"},"PeriodicalIF":5.2,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colin Xu , Florian Naudet , Thomas T. Kim , Michael P. Hengartner , Mark A. Horowitz , Irving Kirsch , Joanna Moncrieff , Ed Pigott , Martin Plöderl
{"title":"Large responses to antidepressants or methodological artifacts? A secondary analysis of STAR∗D, a single-arm, open-label, nonindustry antidepressant trial","authors":"Colin Xu , Florian Naudet , Thomas T. Kim , Michael P. Hengartner , Mark A. Horowitz , Irving Kirsch , Joanna Moncrieff , Ed Pigott , Martin Plöderl","doi":"10.1016/j.jclinepi.2025.111943","DOIUrl":"10.1016/j.jclinepi.2025.111943","url":null,"abstract":"<div><h3>Objectives</h3><div>To replicate Stone et al's (2022) finding that the distribution of response in clinical antidepressant trials is trimodal with large, medium-effect, and small subgroups.</div></div><div><h3>Methods</h3><div>To apply finite mixture modeling to pre-post Hamilton Depression Rating Scale (HDRS) differences (<em>n</em> = 2184) of STAR∗D study's level 1, a single-arm, open-label study. For a successful replication, the best fitting model had to be trimodal, with comparable components as in Stone et al. Secondary/sensitivity analyses repeated the analysis for different baseline levels of depression severity, imputed values, and patient-reported depression symptoms.</div></div><div><h3>Results</h3><div>The best fitting models were either bimodal or trimodal but the trimodal solution did not meet criteria for replication. The bimodal model had 1 component with HDRS mean change of M = −13.0, SD = 6.7 and included 65.3% of patients, and another component with M = −1.8, SD = 5.1, 34.7%, respectively. For the trimodal model, the component with the largest change (M = −14.3, SD = 6.4) applied to 52% of patients, which differed substantially from the large effect component in Stone et al (M = −18.8, SD = 5.1), which applied to 7.2%. Secondary/sensitivity analyses arrived at similar conclusions, and for patient-reported depression symptoms the best fitting models were unimodal or bimodal.</div></div><div><h3>Conclusion</h3><div>This analysis failed to identify the trimodal distribution of response reported in Stone et al. In addition to being difficult to operationalize for regulatory purposes, results from mixture modeling are not sufficiently reliable to replace the more robust approach of comparing mean differences in depression rating scale scores between treatment arms.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111943"},"PeriodicalIF":5.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie Buzzi , Grégory Moullec , Yan Kestens , Laetitia Minary , Jennifer O'Loughlin , Benoît Lalloué , Nelly Agrinier , Jonathan Epstein
{"title":"Psychometric validation of the pictorial ecological momentary well-being instrument","authors":"Marie Buzzi , Grégory Moullec , Yan Kestens , Laetitia Minary , Jennifer O'Loughlin , Benoît Lalloué , Nelly Agrinier , Jonathan Epstein","doi":"10.1016/j.jclinepi.2025.111937","DOIUrl":"10.1016/j.jclinepi.2025.111937","url":null,"abstract":"<div><h3>Objectives</h3><div>With the growing interest in Ecological Momentary Assessment (EMA) in mental health research, the need for precise and reliable measurement tools has become a pressing issue. However, few candidate instruments have been validated in intensive longitudinal data collection contexts. The present study provides an example of the psychometric validation of measurement instruments designed for EMA, by assessing the psychometric properties of Ecological Momentary Well-being Instrument (EMoWI), the first scale specifically designed to measure momentary well-being.</div></div><div><h3>Study Design and Setting</h3><div>Participants from the COvid-19 HEalth and Social InteractiOn in Neighborhoods (COHESION) cohort, a general population sample of Canadian adults, who participated in the September 2022 EMA wave, were included. Prompts including the 8 EMoWI items were sent to participants three times a day, over 10 consecutive days. Based on recent recommendations, we combined Classical Test and Item Response theories to assess content, structural, and construct validity, as well as reliability of EMoWI in an intensive longitudinal data collection context.</div></div><div><h3>Results</h3><div>Two Hundred Ninety adults aged between 19 and 80 were included, representing a total of 7974 prompts over 10 days. Variance decomposition analysis confirmed significant variability in momentary well-being at both the participant and day levels. Multilevel confirmatory factor analysis supported a single factor hypothesis (root mean square error of approximation = 0.074). Internal consistency was high, both at the within- and between-variance level (MacDonald's ω = 0.814 and 0.938, respectively) and we demonstrated longitudinal measurement invariance over time. Variations in mean momentary well-being across subgroups were consistent with our predefined hypotheses, supporting construct validity of EMoWI.</div></div><div><h3>Conclusion</h3><div>We demonstrated the validity and reliability of EMoWI to measure momentary well-being in intensive longitudinal studies. These results will enhance the accuracy of findings related to well-being in EMA studies and inform the development of evidence-based mental health ecological momentary interventions.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111937"},"PeriodicalIF":5.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xia Zhang , Jiayue Xu , Qiao He , Yuning Wang , Shuangyi Xie , Xiaoxing Zhang , Kang Zou , Wen Wang , Xin Sun
{"title":"Immortal time bias tends to be more pronounced in methodological studies than in empirical studies: a metaepidemiological study","authors":"Xia Zhang , Jiayue Xu , Qiao He , Yuning Wang , Shuangyi Xie , Xiaoxing Zhang , Kang Zou , Wen Wang , Xin Sun","doi":"10.1016/j.jclinepi.2025.111936","DOIUrl":"10.1016/j.jclinepi.2025.111936","url":null,"abstract":"<div><h3>Objectives</h3><div>Immortal Time Bias (ITB) is a critical challenge in observational studies estimating treatment effects, often addressed using Mantel–Byar (MB) and Landmark (LM) methods. However, the impact of ITB appears to differ between methodological and empirical studies. This study aims to investigate whether the ITB would be affected by study types and how.</div></div><div><h3>Study Design and Setting</h3><div>We systematically searched PubMed from January 1, 2010, to May 31, 2023, to identify empirical and methodological studies explicitly using LM or MB to address ITB. Eligible studies reported hazard ratio comparing: (i) unadjusted vs MB/LM-adjusted or (ii) MB vs LM-adjusted. We first examined estimate discrepancies across ITB-handling strategies within empirical or methodological studies, and then evaluated concordance across study types.</div></div><div><h3>Results</h3><div>We included 67 studies (46 empirical, 21 methodological). For unadjusted vs adjusted comparisons (58 empirical, 42 methodological), methodological studies exhibited higher rates of conclusion discordance (64.3% vs 32.8%, <em>P</em> = .004), and opposite effect directions (40.5% vs 15.5%, <em>P</em> = .010). For MB vs LM comparisons (20 empirical, 12 methodological), more frequent conclusion discordance was observed in methodological studies (41.7% vs 0%, <em>P</em> = .004), and other discrepancy metrics showed no significant differences between study types.</div></div><div><h3>Conclusion</h3><div>Our findings suggest that ITB tends to have a more pronounced impact in methodological studies, indicating that its influence may vary across different study settings. For methodological studies, it is important to clarify the critical ITB settings and the corresponding handling approaches. For empirical studies suspected of ITB, using rigorous handling strategies can enhance the robustness of treatment effect estimates.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111936"},"PeriodicalIF":5.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Marques-Cruz , Filipe Pinto , Rafael José Vieira , Antonio Bognanni , Paula Perestrelo , Sara Gil-Mata , Vítor Henrique Duarte , José Pedro Barbosa , António Cardoso-Fernandes , Daniel Martinho-Dias , Francisco Franco-Pego , Federico Germini , Chiara Arienti , Alexandro W.L. Chu , Pau Riera-Serra , Paweł Jemioło , Pedro Pereira Rodrigues , João A. Fonseca , Luís Filipe Azevedo , Holger J. Schünemann , Bernardo Sousa-Pinto
{"title":"Use of artificial intelligence to support the assessment of the methodological quality of systematic reviews","authors":"Manuel Marques-Cruz , Filipe Pinto , Rafael José Vieira , Antonio Bognanni , Paula Perestrelo , Sara Gil-Mata , Vítor Henrique Duarte , José Pedro Barbosa , António Cardoso-Fernandes , Daniel Martinho-Dias , Francisco Franco-Pego , Federico Germini , Chiara Arienti , Alexandro W.L. Chu , Pau Riera-Serra , Paweł Jemioło , Pedro Pereira Rodrigues , João A. Fonseca , Luís Filipe Azevedo , Holger J. Schünemann , Bernardo Sousa-Pinto","doi":"10.1016/j.jclinepi.2025.111944","DOIUrl":"10.1016/j.jclinepi.2025.111944","url":null,"abstract":"<div><h3>Objectives</h3><div>Published systematic reviews display a heterogeneous methodological quality, which can impact decision-making. Large language models (LLMs) can support and make the assessment of the methodological quality of systematic reviews more efficient, aiding in the incorporation of their evidence in guideline recommendations. We aimed to develop an LLM-based tool for supporting the assessment of the methodological quality of systematic reviews.</div></div><div><h3>Methods</h3><div>We assessed the performance of 8 LLMs in evaluating the methodological quality of systematic reviews. In particular, we provided 100 systematic reviews for eight LLMs (five base models and three fine-tuned models) to evaluate their methodological quality based on a 27-item validated tool (Reported Methodological Quality (ReMarQ)). The fine-tuned models had been trained with a different sample of 300 manually assessed systematic reviews. We compared the answers provided by LLMs with those independently provided by human reviewers, computing the accuracy, kappa coefficient and F1-score for this comparison.</div></div><div><h3>Results</h3><div>The best performing LLM was a fine-tuned GPT-3.5 model (mean accuracy = 96.5% [95% CI = 89.9%–100%]; mean kappa coefficient = 0.90 [95% CI = 0.71–1.00]; mean F1-score = 0.91 [95% CI = 0.83–1.00]). This model displayed an accuracy >80% and a kappa coefficient >0.60 for all individual items. When we made this LLM assess 60 times the same set of systematic reviews, answers to 18 of 27 items were always consistent (ie, were always the same) and only 11% of assessed systematic reviews showed inconsistency.</div></div><div><h3>Conclusion</h3><div>Overall, LLMs have the potential to accurately support the assessment of the methodological quality of systematic reviews based on a validated tool comprising dichotomous items.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111944"},"PeriodicalIF":5.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When and why to use overlap weighting: clarifying its role, assumptions, and estimand in real-world studies","authors":"John G. Rizk","doi":"10.1016/j.jclinepi.2025.111942","DOIUrl":"10.1016/j.jclinepi.2025.111942","url":null,"abstract":"<div><h3>Objectives</h3><div>To examine the strengths and limitations of overlap weighting in observational studies and to clarify when it is appropriate to use this method based on the target estimand.</div></div><div><h3>Study Design and Setting</h3><div>This is a narrative commentary that reviews recent methodological developments and real-world examples to highlight how overlap weighting operates, when it provides advantages over methods like inverse probability of treatment weighting, and the importance of aligning analytic methods with the causal question and estimand.</div></div><div><h3>Results</h3><div>Overlap weighting produces bounded, stable weights and achieves exact mean covariate balance in the subset of patients with overlapping treatment probabilities near 0.5—those considered to be in clinical equipoise. However, it targets the average treatment effect in the overlap population (ATO), a statistically defined subgroup that is difficult to characterize clinically. Use of this method without prespecifying interest in the ATO may lead to misinterpretation of results. While overlap weighting improves statistical performance, it limits generalizability and interpretability. Study design and inclusion/exclusion criteria remain critical for addressing violations of positivity.</div></div><div><h3>Conclusion</h3><div>Overlap weighting is most appropriate when the research question explicitly targets the overlap population. It should not be adopted solely to resolve estimation issues with average treatment effect or average treatment effect in the treated methods. Researchers must define their target estimand before choosing a method and clearly report the characteristics of both the unweighted and overlap-weighted populations to ensure valid causal inference.</div></div><div><h3>Plain Language Summary</h3><div>Overlap weighting is a statistical method used in health research to compare treatments when people are not randomly assigned to different options. It focuses on patients who could realistically receive either treatment and helps improve the fairness and precision of comparisons. However, the results apply only to this specific group and not everyone in the study. Researchers should choose this method only when it fits the question they are asking.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111942"},"PeriodicalIF":5.2,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"It's time for an update: AGREE III, the next iteration of guideline appraisal","authors":"","doi":"10.1016/j.jclinepi.2025.111935","DOIUrl":"10.1016/j.jclinepi.2025.111935","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111935"},"PeriodicalIF":5.2,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to ‘Systematic reviews of observational studies frequently conclude based on meta-analyses of biased results: standards must be improved’ [Journal of Clinical Epidemiology 184 (2025) 111840]","authors":"Mical Paul , Judith Olchowski , Leonard Leibovici","doi":"10.1016/j.jclinepi.2025.111915","DOIUrl":"10.1016/j.jclinepi.2025.111915","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"186 ","pages":"Article 111915"},"PeriodicalIF":5.2,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Jose Oliveros , Sara Ibrahim , Gonzalo Bravo-Soto , Natalia Chahin-Inostroza , Carlos Zaror , Constanza Ulloa-Lopez , Álvaro Sanhueza , Pamela Seron , Dena Zeraatkar , Gordon Guyatt , Nancy Santesso , Romina Brignardello-Petersen
{"title":"Including conference abstracts rarely changed systematic review conclusions: a case study from a living network meta-analysis of COVID-19 treatments","authors":"Maria Jose Oliveros , Sara Ibrahim , Gonzalo Bravo-Soto , Natalia Chahin-Inostroza , Carlos Zaror , Constanza Ulloa-Lopez , Álvaro Sanhueza , Pamela Seron , Dena Zeraatkar , Gordon Guyatt , Nancy Santesso , Romina Brignardello-Petersen","doi":"10.1016/j.jclinepi.2025.111931","DOIUrl":"10.1016/j.jclinepi.2025.111931","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Including conference abstracts (CAs) in systematic reviews (SRs) helps reduce publication bias but raises concerns about reporting quality and reliability. While discrepancies with full publications are known, excluding CAs may overlook relevant early evidence. To evaluate the reporting quality of CAs, their consistency with full-text publications, and the impact of including them on effect estimates and the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) framework in a living systematic review and network meta-analysis (SRNMA) of COVID-19 drug treatments.</div></div><div><h3>Study Design and Setting</h3><div>We conducted a retrospective methodological study of all CAs included in the COVID-19 SRNMA until May 19, 2024. We assessed trial characteristics, reporting quality using Consolidated Standards of Reporting Trials (CONSORT)-A, and consistency with full-text publications. We also compared meta-analyses with and without CAs at predefined time points for mortality and hospital length of stay, evaluating changes in effect and GRADE domains.</div></div><div><h3>Results</h3><div>We included 105 CAs; 53% (56/105) were linked to a full publication. Only 7% met high reporting standards. Average consistency with full-text publications across key methodological items was 67.6%, often due to missing details in both sources. CAs enabled meta-analyses that would not have been possible at 14% of time points. Their inclusion did not affect conclusions when using the null threshold but changed the effect estimate in 55.6% and imprecision ratings in 16% of cases when using minimally important differences (MIDs). In a few instances, CAs also influenced risk of bias and inconsistency assessments.</div></div><div><h3>Conclusion</h3><div>CAs can fill evidence gaps when data are limited or emerging. Although they rarely change conclusions based on the null threshold, their inclusion has a greater impact when using MID. Reviewers should assess their inclusion case by case and promote better reporting practices to enhance their contribution to SRs.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111931"},"PeriodicalIF":5.2,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios F. Nikolaidis , Anastasios Tasoulas , Georgia Gourgioti , Marco Groß-Langenhoff , Melinda Hamilton , Christian J.A. Ridley , Sylwia Bujkiewicz
{"title":"Including comparative observational evidence in trial-level surrogate end point evaluation: assessing relapse-free survival as a surrogate end point for overall survival in patients with acute myeloid leukemia posttransplant","authors":"Georgios F. Nikolaidis , Anastasios Tasoulas , Georgia Gourgioti , Marco Groß-Langenhoff , Melinda Hamilton , Christian J.A. Ridley , Sylwia Bujkiewicz","doi":"10.1016/j.jclinepi.2025.111933","DOIUrl":"10.1016/j.jclinepi.2025.111933","url":null,"abstract":"<div><h3>Objectives</h3><div>Overall survival (OS) is the gold standard outcome for the assessment of treatment benefits in oncology trials. However, OS requires lengthy patient follow-up and can be confounded by competing risks. This study aims to assess the validity of relapse-free survival (RFS) as a trial-level surrogate end point for OS in acute myeloid leukemia (AML) and develop novel methods to combine data from randomized controlled trials (RCTs) and comparative observational evidence (COE) studies.</div></div><div><h3>Study Design and Setting</h3><div>A systematic review was conducted to identify RCTs and COE studies reporting treatment effects on both RFS and OS in adult patients with AML receiving posthematopoietic stem-cell transplant (HSCT) maintenance therapy. Bayesian meta-analytic models were used to evaluate the RFS–OS surrogate relationship, and statistical methods were developed to enable information sharing, in both adaptive and user-specified manners, between RCTs and COE studies.</div></div><div><h3>Results</h3><div>Six RCTs and 14 COE studies were identified. Analysis of RCT data resulted in a weaker surrogate relationship, with parameters obtained with considerable uncertainty. Borrowing strength from COE studies, in both an adaptive and a user-controlled fashion, resulted in a stronger RFS–OS surrogate relationship with more precise parameters, and adaptive information-sharing models did not suggest any prior-data conflict between the RCTs and COE studies.</div></div><div><h3>Conclusion</h3><div>We present evidence for a potential RFS–OS surrogate relationship in patients with AML post-HSCT. Our novel methodology for borrowing information from COE studies reduced uncertainty in this surrogate relationship, alleviating the issue of a limited RCT evidence base.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"187 ","pages":"Article 111933"},"PeriodicalIF":5.2,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}