Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada-Martin, Alejandra Palacios-López, Beltran Carrillo-Rubinos, Fernando Canillas Del Rey, Juan Jose Baztán Cortés, Javier Gómez Pavón
{"title":"保护伞合作®:一种创新的三级证据合成方法。","authors":"Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada-Martin, Alejandra Palacios-López, Beltran Carrillo-Rubinos, Fernando Canillas Del Rey, Juan Jose Baztán Cortés, Javier Gómez Pavón","doi":"10.2196/75215","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The synthesis of evidence in healthcare is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration® (TU®), an innovative, semi-automatic tertiary evidence synthesis methodology, by comparing it with Traditional Umbrella Reviews (TUR), which are currently the gold standard.</p><p><strong>Objective: </strong>The primary objective of this study is to evaluate whether TU®, an AI-assisted, software-driven system for tertiary evidence synthesis, can achieve comparable effectiveness to TURs, while offering a more timely, efficient, and comprehensive approach.</p><p><strong>Methods: </strong>This comparative study evaluated TU® against TURs across eight matched projects in geriatrics. For each selected TUR, a parallel TU® project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen's d and correlating them with TU®'s RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, large). Certainty levels were compared by mapping GRADE ratings and TU®'s sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU®, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-squared tests and Spearman correlations.</p><p><strong>Results: </strong>Eight TURs in geriatrics were matched with parallel projects using TU®. TU® replicated 84.9% (73/86) of the OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 50.0% of cases and consistent concordance (full plus one-level deviation) in 93.8%, with a moderate strength of association (Cramér's V = 0.339). The correlation of transformed certainty values between TU® and GRADE yielded a statistically significant Spearman coefficient (ρ = 0.446; P = .025). The average execution time per TU® project was 4 hours and 46 minutes, compared to estimated durations of 6-12 months for TURs.</p><p><strong>Conclusions: </strong>The Umbrella Collaboration® demonstrated high concordance with TURs, replicating 84.9% of the outcomes identified by TURs and identifying nearly five times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU® appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.</p><p><strong>Clinicaltrial: </strong></p><p><strong>International registered report: </strong>RR2-10.2196/67248.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Umbrella Collaboration®: An Innovative Tertiary Evidence Synthesis Methodology.\",\"authors\":\"Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada-Martin, Alejandra Palacios-López, Beltran Carrillo-Rubinos, Fernando Canillas Del Rey, Juan Jose Baztán Cortés, Javier Gómez Pavón\",\"doi\":\"10.2196/75215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The synthesis of evidence in healthcare is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration® (TU®), an innovative, semi-automatic tertiary evidence synthesis methodology, by comparing it with Traditional Umbrella Reviews (TUR), which are currently the gold standard.</p><p><strong>Objective: </strong>The primary objective of this study is to evaluate whether TU®, an AI-assisted, software-driven system for tertiary evidence synthesis, can achieve comparable effectiveness to TURs, while offering a more timely, efficient, and comprehensive approach.</p><p><strong>Methods: </strong>This comparative study evaluated TU® against TURs across eight matched projects in geriatrics. For each selected TUR, a parallel TU® project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen's d and correlating them with TU®'s RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, large). Certainty levels were compared by mapping GRADE ratings and TU®'s sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU®, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-squared tests and Spearman correlations.</p><p><strong>Results: </strong>Eight TURs in geriatrics were matched with parallel projects using TU®. TU® replicated 84.9% (73/86) of the OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 50.0% of cases and consistent concordance (full plus one-level deviation) in 93.8%, with a moderate strength of association (Cramér's V = 0.339). The correlation of transformed certainty values between TU® and GRADE yielded a statistically significant Spearman coefficient (ρ = 0.446; P = .025). The average execution time per TU® project was 4 hours and 46 minutes, compared to estimated durations of 6-12 months for TURs.</p><p><strong>Conclusions: </strong>The Umbrella Collaboration® demonstrated high concordance with TURs, replicating 84.9% of the outcomes identified by TURs and identifying nearly five times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU® appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.</p><p><strong>Clinicaltrial: </strong></p><p><strong>International registered report: </strong>RR2-10.2196/67248.</p>\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/75215\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/75215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
摘要
背景:在卫生保健证据的综合是至关重要的知情决策和政策制定。本研究旨在验证雨伞协作®(TU®),一种创新的半自动三级证据合成方法,通过将其与目前的金标准传统雨伞评论(TUR)进行比较。目的:本研究的主要目的是评估人工智能辅助、软件驱动的三级证据合成系统TU®是否能达到与TURs相当的有效性,同时提供更及时、高效和全面的方法。方法:本比较研究评估了TU®与TURs在8个匹配老年病学项目中的对比。对于每个选定的TUR,使用相同的研究问题进行平行的TU®项目。系统地比较感兴趣的结果(OoIs)、效应大小、确定性评分和执行时间。通过将TUR指标转换为Cohen's d并将其与TU®的RTU指标相关联,定量地评估了效应大小,并通过类别分类(一般、小、中等、大)定性地评估了效应大小。通过将GRADE评级和TU®的情绪分析得分映射到常见的0-1量表来比较确定性水平。执行时间在TU®中精确测量,而TUR持续时间则从文献基准中估计。统计分析包括卡方检验和斯皮尔曼相关性。结果:8例老年患者的TURs与TU®的平行项目相匹配。TU®重复了TURs鉴定的84.9%(73/86)的ooi,并报告了额外的337个ooi,结果鉴定增加了4.77倍。在效应大小分类的比较中,50.0%的病例完全一致,93.8%的病例一致(完全加一级偏差),具有中等强度的关联(cramsamr’s V = 0.339)。转换后的确定性值在TU®和GRADE之间的相关性产生了具有统计学意义的Spearman系数(ρ = 0.446;P = .025)。每个TU®项目的平均执行时间为4小时46分钟,而TURs的估计持续时间为6-12个月。结论:Umbrella Collaboration®显示出与TURs的高度一致性,TURs确定的结果重复率为84.9%,识别的额外结果几乎是TURs的5倍。实验效应大小度量(RTU)与传统测量方法显示出适度的一致性,从情绪分析得出的确定性评级与基于grade的评估有可接受的相关性。虽然还需要进一步的验证,但TU®似乎是一种有效和高效的三级证据合成方法,在需要快速结果时提供了一种可扩展和省时的替代方法。临床试验:国际注册报告:RR2-10.2196/67248。
The Umbrella Collaboration®: An Innovative Tertiary Evidence Synthesis Methodology.
Background: The synthesis of evidence in healthcare is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration® (TU®), an innovative, semi-automatic tertiary evidence synthesis methodology, by comparing it with Traditional Umbrella Reviews (TUR), which are currently the gold standard.
Objective: The primary objective of this study is to evaluate whether TU®, an AI-assisted, software-driven system for tertiary evidence synthesis, can achieve comparable effectiveness to TURs, while offering a more timely, efficient, and comprehensive approach.
Methods: This comparative study evaluated TU® against TURs across eight matched projects in geriatrics. For each selected TUR, a parallel TU® project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen's d and correlating them with TU®'s RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, large). Certainty levels were compared by mapping GRADE ratings and TU®'s sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU®, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-squared tests and Spearman correlations.
Results: Eight TURs in geriatrics were matched with parallel projects using TU®. TU® replicated 84.9% (73/86) of the OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 50.0% of cases and consistent concordance (full plus one-level deviation) in 93.8%, with a moderate strength of association (Cramér's V = 0.339). The correlation of transformed certainty values between TU® and GRADE yielded a statistically significant Spearman coefficient (ρ = 0.446; P = .025). The average execution time per TU® project was 4 hours and 46 minutes, compared to estimated durations of 6-12 months for TURs.
Conclusions: The Umbrella Collaboration® demonstrated high concordance with TURs, replicating 84.9% of the outcomes identified by TURs and identifying nearly five times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU® appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.
Clinicaltrial:
International registered report: RR2-10.2196/67248.