Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada-Martin, Alejandra Palacios-López, Beltran Carrillo-Rubinos, Fernando Canillas Del Rey, Juan Jose Baztán Cortés, Javier Gómez Pavón
{"title":"The Umbrella Collaboration®: An Innovative Tertiary Evidence Synthesis Methodology.","authors":"Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada-Martin, Alejandra Palacios-López, Beltran Carrillo-Rubinos, Fernando Canillas Del Rey, Juan Jose Baztán Cortés, Javier Gómez Pavón","doi":"10.2196/75215","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The synthesis of evidence in healthcare is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration® (TU®), an innovative, semi-automatic tertiary evidence synthesis methodology, by comparing it with Traditional Umbrella Reviews (TUR), which are currently the gold standard.</p><p><strong>Objective: </strong>The primary objective of this study is to evaluate whether TU®, an AI-assisted, software-driven system for tertiary evidence synthesis, can achieve comparable effectiveness to TURs, while offering a more timely, efficient, and comprehensive approach.</p><p><strong>Methods: </strong>This comparative study evaluated TU® against TURs across eight matched projects in geriatrics. For each selected TUR, a parallel TU® project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen's d and correlating them with TU®'s RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, large). Certainty levels were compared by mapping GRADE ratings and TU®'s sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU®, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-squared tests and Spearman correlations.</p><p><strong>Results: </strong>Eight TURs in geriatrics were matched with parallel projects using TU®. TU® replicated 84.9% (73/86) of the OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 50.0% of cases and consistent concordance (full plus one-level deviation) in 93.8%, with a moderate strength of association (Cramér's V = 0.339). The correlation of transformed certainty values between TU® and GRADE yielded a statistically significant Spearman coefficient (ρ = 0.446; P = .025). The average execution time per TU® project was 4 hours and 46 minutes, compared to estimated durations of 6-12 months for TURs.</p><p><strong>Conclusions: </strong>The Umbrella Collaboration® demonstrated high concordance with TURs, replicating 84.9% of the outcomes identified by TURs and identifying nearly five times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU® appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.</p><p><strong>Clinicaltrial: </strong></p><p><strong>International registered report: </strong>RR2-10.2196/67248.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/75215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The synthesis of evidence in healthcare is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration® (TU®), an innovative, semi-automatic tertiary evidence synthesis methodology, by comparing it with Traditional Umbrella Reviews (TUR), which are currently the gold standard.
Objective: The primary objective of this study is to evaluate whether TU®, an AI-assisted, software-driven system for tertiary evidence synthesis, can achieve comparable effectiveness to TURs, while offering a more timely, efficient, and comprehensive approach.
Methods: This comparative study evaluated TU® against TURs across eight matched projects in geriatrics. For each selected TUR, a parallel TU® project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen's d and correlating them with TU®'s RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, large). Certainty levels were compared by mapping GRADE ratings and TU®'s sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU®, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-squared tests and Spearman correlations.
Results: Eight TURs in geriatrics were matched with parallel projects using TU®. TU® replicated 84.9% (73/86) of the OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 50.0% of cases and consistent concordance (full plus one-level deviation) in 93.8%, with a moderate strength of association (Cramér's V = 0.339). The correlation of transformed certainty values between TU® and GRADE yielded a statistically significant Spearman coefficient (ρ = 0.446; P = .025). The average execution time per TU® project was 4 hours and 46 minutes, compared to estimated durations of 6-12 months for TURs.
Conclusions: The Umbrella Collaboration® demonstrated high concordance with TURs, replicating 84.9% of the outcomes identified by TURs and identifying nearly five times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU® appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.
Clinicaltrial:
International registered report: RR2-10.2196/67248.