老年病学三级证据综合伞式合作的验证:混合方法研究。

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES
Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada, Alejandra Palacios, Beltran Carrillo-Rubinos, Fernando Canillas, Juan José Baztán Cortés, Javier Gómez-Pavón
{"title":"老年病学三级证据综合伞式合作的验证:混合方法研究。","authors":"Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada, Alejandra Palacios, Beltran Carrillo-Rubinos, Fernando Canillas, Juan José Baztán Cortés, Javier Gómez-Pavón","doi":"10.2196/75215","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The synthesis of evidence in health care is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration (TU), an innovative, semiautomated tertiary evidence synthesis methodology, by comparing it with traditional umbrella reviews (TURs), which are currently the gold standard.</p><p><strong>Objective: </strong>The primary objective of this study is to evaluate whether TU, an artificial intelligence-assisted, software-driven system for tertiary evidence synthesis, can achieve effectiveness comparable to that of TURs, while offering a more timely, efficient, and comprehensive approach.</p><p><strong>Methods: </strong>This comparative study evaluated TU against TURs across 8 matched projects in geriatrics. For each selected TUR, a parallel TU project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen d and correlating them with TU's RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, and large). Certainty levels were compared by mapping Grading of Recommendations Assessment, Development, and Evaluation (GRADE) ratings and TU's sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-square tests and Spearman correlations.</p><p><strong>Results: </strong>Eight TURs in geriatrics were matched with parallel projects using TU. TU replicated 73 of the 86 (85%) OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 24 of the 48 (50%) cases, and consistent concordance (full plus 1-level deviation) in 45 of the 48 (94%) cases, with a moderate strength of association (Cramér V=0.339). The correlation of transformed certainty values between TU and GRADE yielded a statistically significant Spearman coefficient (ρ=0.446; P=.02). The average execution time per TU project was 4 hours and 46 minutes, compared with estimated durations of 6-12 months for TURs.</p><p><strong>Conclusions: </strong>The TU demonstrated high concordance with TURs, replicating 73 of the 86 (85%) outcomes identified by TURs and identifying nearly 5 times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e75215"},"PeriodicalIF":2.0000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12262930/pdf/","citationCount":"0","resultStr":"{\"title\":\"Validation of The Umbrella Collaboration for Tertiary Evidence Synthesis in Geriatrics: Mixed Methods Study.\",\"authors\":\"Beltran Carrillo, Marta Rubinos-Cuadrado, Jazmin Parellada, Alejandra Palacios, Beltran Carrillo-Rubinos, Fernando Canillas, Juan José Baztán Cortés, Javier Gómez-Pavón\",\"doi\":\"10.2196/75215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The synthesis of evidence in health care is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration (TU), an innovative, semiautomated tertiary evidence synthesis methodology, by comparing it with traditional umbrella reviews (TURs), which are currently the gold standard.</p><p><strong>Objective: </strong>The primary objective of this study is to evaluate whether TU, an artificial intelligence-assisted, software-driven system for tertiary evidence synthesis, can achieve effectiveness comparable to that of TURs, while offering a more timely, efficient, and comprehensive approach.</p><p><strong>Methods: </strong>This comparative study evaluated TU against TURs across 8 matched projects in geriatrics. For each selected TUR, a parallel TU project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen d and correlating them with TU's RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, and large). Certainty levels were compared by mapping Grading of Recommendations Assessment, Development, and Evaluation (GRADE) ratings and TU's sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-square tests and Spearman correlations.</p><p><strong>Results: </strong>Eight TURs in geriatrics were matched with parallel projects using TU. TU replicated 73 of the 86 (85%) OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 24 of the 48 (50%) cases, and consistent concordance (full plus 1-level deviation) in 45 of the 48 (94%) cases, with a moderate strength of association (Cramér V=0.339). The correlation of transformed certainty values between TU and GRADE yielded a statistically significant Spearman coefficient (ρ=0.446; P=.02). The average execution time per TU project was 4 hours and 46 minutes, compared with estimated durations of 6-12 months for TURs.</p><p><strong>Conclusions: </strong>The TU demonstrated high concordance with TURs, replicating 73 of the 86 (85%) outcomes identified by TURs and identifying nearly 5 times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.</p>\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\"9 \",\"pages\":\"e75215\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12262930/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/75215\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/75215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:卫生保健证据的综合对知情决策和政策制定至关重要。本研究旨在通过将伞式协作(TU)与目前作为金标准的传统伞式评价(TURs)进行比较,验证一种创新的、半自动化的三级证据合成方法。目的:本研究的主要目的是评估人工智能辅助、软件驱动的三级证据综合系统TU是否能够达到与TURs相当的有效性,同时提供更及时、高效和全面的方法。方法:本比较研究评估了TU与TURs在8个匹配老年病学项目中的对比。对于每个选定的TUR,使用相同的研究问题进行平行的TU项目。系统地比较感兴趣的结果(OoIs)、效应大小、确定性评分和执行时间。通过将TUR指标转换为Cohen d并将其与TU的RTU指标相关联,定量地评估了效应大小,并通过类别分类(一般、小、中等和大)定性地评估了效应大小。通过将推荐评分、评估、发展和评估(GRADE)评级和TU的情绪分析得分映射到一个常见的0-1量表来比较确定性水平。执行时间以TU精确测量,而TUR持续时间则从文献基准中估计。统计分析包括卡方检验和Spearman相关性。结果:老年病学的8个TURs与使用TU的平行项目相匹配。TU复制了TURs识别的86个ooi中的73个(85%),并报告了额外的337个ooi,结果识别增加了4.77倍。在效应大小分类的比较中,48例中有24例(50%)达到完全一致性,48例中有45例(94%)达到一致一致性(完全+ 1水平偏差),具有中等强度的关联(cramsamr V=0.339)。转换后的确定性值在TU和GRADE之间的相关性产生了具有统计学意义的Spearman系数(ρ=0.446;P = .02点)。每个TU项目的平均执行时间为4小时46分钟,而TURs的估计持续时间为6-12个月。结论:TU与TURs具有高度的一致性,重复了TURs鉴定的86个结果中的73个(85%),并且鉴定了近5倍的其他结果。实验效应大小度量(RTU)与传统测量方法显示出适度的一致性,从情绪分析得出的确定性评级与基于grade的评估有可接受的相关性。虽然需要进一步验证,但TU似乎是一种有效和有效的三级证据合成方法,在需要快速结果时提供了一种可扩展和省时的替代方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Validation of The Umbrella Collaboration for Tertiary Evidence Synthesis in Geriatrics: Mixed Methods Study.

Background: The synthesis of evidence in health care is essential for informed decision-making and policy development. This study aims to validate The Umbrella Collaboration (TU), an innovative, semiautomated tertiary evidence synthesis methodology, by comparing it with traditional umbrella reviews (TURs), which are currently the gold standard.

Objective: The primary objective of this study is to evaluate whether TU, an artificial intelligence-assisted, software-driven system for tertiary evidence synthesis, can achieve effectiveness comparable to that of TURs, while offering a more timely, efficient, and comprehensive approach.

Methods: This comparative study evaluated TU against TURs across 8 matched projects in geriatrics. For each selected TUR, a parallel TU project was conducted using the same research question. Outcomes of interest (OoIs), effect sizes, certainty ratings, and execution times were systematically compared. Effect sizes were assessed both quantitatively, by transforming TUR metrics to Cohen d and correlating them with TU's RTU metric, and qualitatively, through categorical classifications (trivial, small, moderate, and large). Certainty levels were compared by mapping Grading of Recommendations Assessment, Development, and Evaluation (GRADE) ratings and TU's sentiment analysis scores onto a common 0-1 scale. Execution time was measured precisely in TU, while TUR durations were estimated from literature benchmarks. Statistical analyses included chi-square tests and Spearman correlations.

Results: Eight TURs in geriatrics were matched with parallel projects using TU. TU replicated 73 of the 86 (85%) OoIs identified by TURs and reported an additional 337 OoIs, representing a 4.77-fold increase in outcome identification. In the comparison of effect size classifications, full concordance was observed in 24 of the 48 (50%) cases, and consistent concordance (full plus 1-level deviation) in 45 of the 48 (94%) cases, with a moderate strength of association (Cramér V=0.339). The correlation of transformed certainty values between TU and GRADE yielded a statistically significant Spearman coefficient (ρ=0.446; P=.02). The average execution time per TU project was 4 hours and 46 minutes, compared with estimated durations of 6-12 months for TURs.

Conclusions: The TU demonstrated high concordance with TURs, replicating 73 of the 86 (85%) outcomes identified by TURs and identifying nearly 5 times as many additional outcomes. The experimental effect size metric (RTU) showed moderate agreement with conventional measures, and the certainty ratings derived from sentiment analysis correlated acceptably with GRADE-based assessments. While further validation is needed, TU appears to be a valid and efficient approach for tertiary evidence synthesis, offering a scalable and time-efficient alternative when rapid results are required.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Formative Research
JMIR Formative Research Medicine-Medicine (miscellaneous)
CiteScore
2.70
自引率
9.10%
发文量
579
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信