Cochrane Evidence Synthesis and Methods最新文献

Long-Term Outcomes of Invasive vs Noninvasive Treatment for Intermittent Claudication: A Systematic Review and Meta-Analysis 间歇性跛行有创与无创治疗的长期结果：系统回顾和荟萃分析

Cochrane Evidence Synthesis and Methods Pub Date : 2025-10-03 DOI: 10.1002/cesm.70053

Anas Elmahi, Nathalie Doolan, Mohiedin Hezima, Anwar Gowey, Daragh Moneley, Seamus McHugh, Sayed Aly, Peter Naughton, Elrasheid A. H. Kheirelseid

{"title":"Long-Term Outcomes of Invasive vs Noninvasive Treatment for Intermittent Claudication: A Systematic Review and Meta-Analysis","authors":"Anas Elmahi, Nathalie Doolan, Mohiedin Hezima, Anwar Gowey, Daragh Moneley, Seamus McHugh, Sayed Aly, Peter Naughton, Elrasheid A. H. Kheirelseid","doi":"10.1002/cesm.70053","DOIUrl":"https://doi.org/10.1002/cesm.70053","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Intermittent claudication (IC) is a hallmark symptom of peripheral arterial disease (PAD), causing pain and discomfort during physical activity caused by reduced blood flow to the lower extremities. The condition significantly impairs mobility and quality of life (QoL) in affected individuals. Treatment options for IC range from conservative approaches, including best medical therapy (BMT) and supervised exercise therapy (SET), to invasive interventions like angioplasty and open re-vascularization.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Aim</h3>\u0000 \u0000 <p>This meta-analysis and systematic review seek to assess the long-term results of invasive procedures concerning Noninvasive treatments for the management of patients with IC.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>A comprehensive search was conducted in October 2024 across databases containing PubMed, MEDLINE, Cochrane Library, Embase, and Scopus. Randomized controlled trials (RCTs) comparing invasive interventions to Noninvasive treatments were included. Primary outcomes were quality of life (QoL), ankle-brachial pressure index (ABPI), and maximum walking distance (MWD). Secondary outcomes were major adverse cardiovascular events (MACE), mortality, complications, and re-intervention rates. Data analysis was conducted using the Cochrane Review Manager 5. Follow-up duration was between 2 and 7 years, longest available between 2 and 7 years; prioritized 2 years when present.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>A total of 11 RCTs with 1379 patients were included in the analysis. Invasive treatments demonstrated a significant improvement in MWD and ABPI compared to Noninvasive treatments (MWD pooled Mean Difference (MD) = 64.94 [10.77, 115.12] 95% CI, <i>p</i> = .02, 5 studies, and ABPI pooled MD = 0.15 [0.04, 0.26] 95% CI, <i>p</i> = .006, 5 studies). However, invasive interventions were associated with a higher rate of complications, including increased amputation risk (Pooled odds ratio (OR) = 2.46 [0.44, 13.94] 95% CI, <i>p</i> = .31, 3 studies), though this was not statistically significant. Long-term rates were higher in the Noninvasive treatment group (Pooled OR: 0.56 [0.33, 0.97] 95% CI, <i>p</i> = .04).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Both invasive and Noninvasive treatments are effective in managing IC. Invasive treatments provide greater improvement in blood flow and walking distance, but the risk of complications and re-interventio","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies 引导性人工智能与传统文献检索在证据综合中的比较——以四个案例为例

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-27 DOI: 10.1002/cesm.70050

Oscar Lau, Su Golder

{"title":"Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies","authors":"Oscar Lau, Su Golder","doi":"10.1002/cesm.70050","DOIUrl":"https://doi.org/10.1002/cesm.70050","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Elicit AI aims to simplify and accelerate the systematic review process without compromising accuracy. However, research on Elicit's performance is limited.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Objectives</h3>\u0000 \u0000 <p>To determine whether Elicit AI is a viable tool for systematic literature searches and title/abstract screening stages.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We compared the included studies in four evidence syntheses to those identified using the subscription-based version of Elicit Pro in Review mode. We calculated sensitivity, precision and observed patterns in the performance of Elicit.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The sensitivity of Elicit was poor, averaging 39.5% (25.5–69.2%) compared to 94.5% (91.1–98.0%) in the original reviews. However, Elicit identified some included studies not identified by the original searches and had an average of 41.8% precision (35.6–46.2%) which was higher than the 7.55% average of the original reviews (0.65–14.7%).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Discussion</h3>\u0000 \u0000 <p>At the time of this evaluation, Elicit did not search with high enough sensitivity to replace traditional literature searching. However, the high precision of searching in Elicit could prove useful for preliminary searches, and the unique studies identified mean that Elicit can be used by researchers as a useful adjunct.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>Whilst Elicit searches are currently not sensitive enough to replace traditional searching, Elicit is continually improving, and further evaluations should be undertaken as new developments take place.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Split Body Trials in Systematic Reviews and Meta-Analyses: A Tutorial 系统评价和荟萃分析中的分离体试验：指南

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-24 DOI: 10.1002/cesm.70052

Nuala Livingstone, Kerry Dwan, Marty Chaplin

引用次数: 0

Leveraging AI for Meta-Analysis: Evaluating LLMs in Detecting Publication Bias for Next-Generation Evidence Synthesis 利用人工智能进行荟萃分析：评估法学硕士在检测下一代证据合成的发表偏倚方面的作用

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-18 DOI: 10.1002/cesm.70047

Xing Xing, Lifeng Lin, Mohammad Hassan Murad, Jiayi Tong

{"title":"Leveraging AI for Meta-Analysis: Evaluating LLMs in Detecting Publication Bias for Next-Generation Evidence Synthesis","authors":"Xing Xing, Lifeng Lin, Mohammad Hassan Murad, Jiayi Tong","doi":"10.1002/cesm.70047","DOIUrl":"https://doi.org/10.1002/cesm.70047","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Publication bias (PB) threatens the validity of meta-analyses by distorting effect size estimates, potentially leading to misleading conclusions. With advanced pattern recognition and multimodal capabilities, large language models (LLMs) may be able to evaluate PB and make the systematic review process more efficient.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We evaluated the ability of two state-of-the-art multimodal LLMs, GPT-4o and Llama 3.2 Vision, to detect PB using funnel plots alone and in combination with quantitative inputs. We simulated meta-analyses under varying conditions, including the absence of PB, different levels of presence of PB, varying total number of studies within a meta-analysis, and differing degrees of between-study heterogeneity.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Neither GPT-4o nor Llama 3.2 Vision consistently detected the presence of PB across various settings. Under no-publication-bias conditions, GPT-4o achieved a higher specificity outperforming Llama 3.2 Vision, with the difference most shown in the meta-analyses with 20 or more studies. The inclusion of quantitative inputs alongside funnel plots did not significantly improve performance. Additionally, between-study heterogeneity and patterns of non-reported studies had minimal impact on the models’ assessments.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>The ability of LLMs to detect PB without fine-tuning is limited at the present time. This study highlights the need for specialized model adaptation before LLMs can be effectively integrated into meta-analysis workflows. Future research can focus on targeted refinements to enhance LLM performance and utility in evidence synthesis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145101484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Retiring the Term “Weighted Mean Difference” in Contemporary Evidence Synthesis 退出当代证据综合中的“加权平均差”一词

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-11 DOI: 10.1002/cesm.70051

Lifeng Lin, Xing Xing, Wenshan Han, Jiayi Tong

{"title":"Retiring the Term “Weighted Mean Difference” in Contemporary Evidence Synthesis","authors":"Lifeng Lin, Xing Xing, Wenshan Han, Jiayi Tong","doi":"10.1002/cesm.70051","DOIUrl":"https://doi.org/10.1002/cesm.70051","url":null,"abstract":"<p>Evidence synthesis frequently involves quantitative analyses of continuous outcomes. A cross-sectional study examining Cochrane systematic reviews identified 6672 out of 22,453 meta-analyses (29.7%) involved continuous outcomes [<span>1</span>]. The primary effect measures employed in meta-analyses of continuous outcomes are the mean difference (MD) and standardized mean difference (SMD) [<span>2</span>]. The MD is appropriately applied when all included studies measure outcomes using identical scales (e.g., body weight in kilograms). In contrast, the SMD serves as a solution when studies utilize different measurement scales (e.g., varied questionnaire scoring methods). Although alternative measures (e.g., the ratio of means) exist [<span>3</span>], their application remains relatively infrequent.</p><p>Despite this conceptual clarity, the term “weighted mean difference” (WMD) appears frequently in the systematic review literature [<span>4</span>], which can lead to confusion about its relationship to the MD. In this article, we first clarify the distinction between MD and WMD, then describe the historical factors underlying the term's adoption and persistence, discuss why contemporary methods render it unnecessary, illustrate examples of misuse, and conclude with practical recommendations for clearer reporting.</p><p>The MD represents the straightforward difference between group means (e.g., intervention vs. control) for a continuous outcome. Although the true MD value relates to unknown population-level differences, practical research relies on sample estimates from individual studies. Meta-analysis systematically synthesizes these study-level MD estimates to derive an overall summary effect across studies.</p><p>The term WMD emerged historically to emphasize the weighted averaging process of meta-analyses, wherein each study contributes a sample MD weighted by its statistical precision (i.e., inverse variance) [<span>5</span>]. Typically, larger studies with smaller variances or narrower confidence intervals are assigned greater weights. Traditional meta-analytical methods, performed through either fixed-effect (also known as common-effect) or random-effects models, follow this inverse-variance weighting principle. Under fixed-effect models, study weights directly reflect the inverse of their variances, whereas random-effects models incorporate both within-study and between-study variances.</p><p>To contextualize the widespread adoption of WMD, we conducted a brief literature search using Google Scholar on June 12, 2025. Using exact-phrase queries in quotation marks, for each calendar year from 1990 to 2024, we recorded the counts for “weighted mean difference” AND “systematic review” and separately for “systematic review,” then calculated the yearly proportion (Figure 1). Google Scholar indexes titles, abstracts, and, when available, full texts, so counts reflect occurrences anywhere in the indexed record, and these counts are approximate.","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using a Large Language Model (ChatGPT-4o) to Assess the Risk of Bias in Randomized Controlled Trials of Medical Interventions: Interrater Agreement With Human Reviewers 使用大型语言模型（chatgpt - 40）评估医学干预随机对照试验中的偏倚风险：与人类审稿人的审稿人一致

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-10 DOI: 10.1002/cesm.70048

Christopher James Rose, Julia Bidonde, Martin Ringsten, Julie Glanville, Thomas Potrebny, Chris Cooper, Ashley Elizabeth Muller, Hans Bugge Bergsund, Jose F. Meneses-Echavez, Rigmor C. Berg

{"title":"Using a Large Language Model (ChatGPT-4o) to Assess the Risk of Bias in Randomized Controlled Trials of Medical Interventions: Interrater Agreement With Human Reviewers","authors":"Christopher James Rose, Julia Bidonde, Martin Ringsten, Julie Glanville, Thomas Potrebny, Chris Cooper, Ashley Elizabeth Muller, Hans Bugge Bergsund, Jose F. Meneses-Echavez, Rigmor C. Berg","doi":"10.1002/cesm.70048","DOIUrl":"https://doi.org/10.1002/cesm.70048","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Risk of bias (RoB) assessment is a highly skilled task that is time-consuming and subject to human error. RoB automation tools have previously used machine learning models built using relatively small task-specific training sets. Large language models (LLMs; e.g., ChatGPT) are complex models built using non-task-specific Internet-scale training sets. They demonstrate human-like abilities and might be able to support tasks like RoB assessment.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Following a published peer-reviewed protocol, we randomly sampled 100 Cochrane reviews. New or updated reviews that evaluated medical interventions, included ≥ 1 eligible trial, and presented human consensus assessments using Cochrane RoB1 or RoB2 were eligible. We excluded reviews performed under emergency conditions (e.g., COVID-19), and those on public health or welfare. We randomly sampled one trial from each review. Trials using individual- or cluster-randomized designs were eligible. We extracted human consensus RoB assessments of the trials from the reviews, and methods texts from the trials. We used 25 review-trial pairs to develop a ChatGPT prompt to assess RoB using trial methods text. We used the prompt and the remaining 75 review-trial pairs to estimate human-ChatGPT agreement for “Overall RoB” (primary outcome) and “RoB due to the randomization process”, and ChatGPT-ChatGPT (intrarater) agreement for “Overall RoB”. We used ChatGPT-4o (February 2025) throughout.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The 75 reviews were sampled from 35 Cochrane review groups, and all used RoB1. The 75 trials spanned five decades, and all but one were published in English. Human-ChatGPT agreement for “Overall RoB” assessment was 50.7% (95% CI 39.3%–62.0%), substantially higher than expected by chance (<i>p</i> = 0.0015). Human-ChatGPT agreement for “RoB due to the randomization process” was 78.7% (95% CI 69.4%–88.0%; <i>p</i> < 0.001). ChatGPT-ChatGPT agreement was 74.7% (95% CI 64.8%–84.6%; <i>p</i> < 0.001).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>ChatGPT appears to have some ability to assess RoB and is unlikely to be guessing or “hallucinating”. The estimated agreement for “Overall RoB” is well above estimates of agreement reported for some human reviewers, but below the highest estimates. LLM-based systems for assessing RoB may be able to help streamline and improve evidence synthesis production.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence Search Tools for Evidence Synthesis: Comparative Analysis and Implementation Recommendations 用于证据合成的人工智能搜索工具：比较分析和实施建议

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-08 DOI: 10.1002/cesm.70045

Robin Featherstone, Melissa Walter, Danielle MacDougall, Eric Morenz, Sharon Bailey, Robyn Butcher, Caitlyn Ford, Hannah Loshak, David Kaunelis

{"title":"Artificial Intelligence Search Tools for Evidence Synthesis: Comparative Analysis and Implementation Recommendations","authors":"Robin Featherstone, Melissa Walter, Danielle MacDougall, Eric Morenz, Sharon Bailey, Robyn Butcher, Caitlyn Ford, Hannah Loshak, David Kaunelis","doi":"10.1002/cesm.70045","DOIUrl":"https://doi.org/10.1002/cesm.70045","url":null,"abstract":"<p>To inform implementation recommendations for novel or emerging technologies, Research Information Services at Canada's Drug Agency conducted a multimodal research project involving a literature review, a retrospective comparative analysis, and a focus group on 3 Artificial Intelligence (AI) or automation tools for information retrieval (AI search tools): Lens.org, SpiderCite, and Microsoft Copilot. For the comparative analysis, the customary information retrieval practices used at Canada's Drug Agency served as our reference standard for comparison, and we used the eligible studies of 7 completed projects to measure tool performance. For searches conducted with our usual practice approaches and with each of the 3 tools, we calculated sensitivity/recall, number needed to read (NNR), time to search and screen, unique contributions, and the likely impact of the unique contributions on the projects’ findings. Our investigation confirmed that AI search tools have inconsistent and variable performance for the range of information retrieval tasks performed at Canada's Drug Agency. Implementation recommendations from this study informed a “fit for purpose” approach where Information Specialists leverage AI search tools for specific tasks or project types.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Role of Artificial Intelligence in Evidence Synthesis: Insights From the CORE Information Retrieval Forum 2025 探讨人工智能在证据合成中的作用：来自2025年CORE信息检索论坛的见解

Cochrane Evidence Synthesis and Methods Pub Date : 2025-09-07 DOI: 10.1002/cesm.70049

Claire H. Eastaugh, Madeleine Still, Fiona R. Beyer, Sheila A. Wallace, Hannah O'Keefe

{"title":"Exploring the Role of Artificial Intelligence in Evidence Synthesis: Insights From the CORE Information Retrieval Forum 2025","authors":"Claire H. Eastaugh, Madeleine Still, Fiona R. Beyer, Sheila A. Wallace, Hannah O'Keefe","doi":"10.1002/cesm.70049","DOIUrl":"https://doi.org/10.1002/cesm.70049","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Information retrieval is essential for evidence synthesis, but developing search strategies can be labor-intensive and time-consuming. Automating these processes would be of benefit and interest, though it is unclear if Information Specialists (IS) are willing to adopt artificial intelligence (AI) methodologies or how they currently use them. In January 2025, the NIHR Innovation Observatory and NIHR Methodology Incubator for Applied Health and Care Research co-sponsored the inaugural CORE Information Retrieval Forum, where attendees discussed AI's role in information retrieval.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>The CORE Information Retrieval Forum hosted a Knowledge Café. Participation was voluntary, and attendees could choose one of six event-themed discussion tables including AI. To support each discussion, a QR code linking to a virtual collaboration tool (Padlet; padlet.com) and a poster in the exhibition space were available throughout the day for attendee contributions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The CORE Information Retrieval Forum was attended by 131 IS from nine different types of organizations, with most from the UK and ten countries represented overall. Among the six discussion points available in the Knowledge Café, the AI table was the most popular, receiving the highest number of contributions (<i>n</i> = 49). Following the Forum, contributions to the AI topic were categorized into four themes: critical perception (<i>n</i> = 21), current uses (<i>n</i> = 19), specific tools (<i>n</i> = 2), and training wants/needs (<i>n</i> = 7).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>While there are critical perspectives on the integration of AI in the IS space, this is not due to a reluctance to adapt and adopt but from a need for structure, education, training, ethical guidance, and systems to support the responsible use and transparency of AI. There is interest in automating repetitive and time-consuming tasks, but attendees reported a lack of appropriate supporting tools. More work is required to identify the suitability of currently available tools and their potential to complement the work conducted by IS.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human Versus Artificial Intelligence: Comparing Cochrane Authors' and ChatGPT's Risk of Bias Assessments 人类与人工智能：比较Cochrane作者和ChatGPT的偏倚风险评估

Cochrane Evidence Synthesis and Methods Pub Date : 2025-08-31 DOI: 10.1002/cesm.70044

Petek Eylul Taneri

{"title":"Human Versus Artificial Intelligence: Comparing Cochrane Authors' and ChatGPT's Risk of Bias Assessments","authors":"Petek Eylul Taneri","doi":"10.1002/cesm.70044","DOIUrl":"https://doi.org/10.1002/cesm.70044","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Systematic reviews and meta-analyses synthesize randomized trial data to guide clinical decisions but require significant time and resources. Artificial intelligence (AI) offers a promising solution to streamline evidence synthesis, aiding study selection, data extraction, and risk of bias assessment. This study aims to evaluate the performance of ChatGPT-4o in assessing the risk of bias in randomised controlled trials (RCTs) using the Risk of Bias 2 (RoB 2) tool, comparing its results with those conducted by human reviewers in Cochrane Reviews.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>A sample of Cochrane Reviews utilizing the RoB 2 tool was identified through the Cochrane Database of Systematic Reviews (CDSR). Protocols, qualitative systematic reviews, and reviews employing alternative risk of bias assessment tools were excluded. The study utilized ChatGPT-4o to assess the risk of bias using a structured set of prompts corresponding to the RoB 2 domains. The agreement between ChatGPT-4o and consensus-based human reviewer assessments was evaluated using weighted kappa statistics. Additionally, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were calculated. All analyses were performed using R Studio (version 4.3.0).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>A total of 42 Cochrane Reviews were screened, yielding a final sample of eight eligible reviews comprising 84 RCTs. The primary outcome of each included review was selected for risk of bias assessment. ChatGPT-4o demonstrated moderate agreement with human reviewers for the overall risk of bias judgments (weighted kappa = 0.51, 95% CI: 0.36–0.66). Agreement varied across domains, ranging from fair (<i>κ</i> = 0.20 for selection of the reported results) to moderate (<i>κ</i> = 0.59 for measurement of outcomes). ChatGPT-4o exhibited a sensitivity of 53% for identifying high-risk studies and a specificity of 99% for classifying low-risk studies.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>This study shows that ChatGPT-4o can perform risk of bias assessments using RoB 2 with fair to moderate agreement with human reviewers. While AI-assisted risk of bias assessment remains imperfect, advancements in prompt engineering and model refinement may enhance performance. Future research should explore standardised prompts and investigate interrater reliability among human reviewers to provide a more robust comparison.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence and Automation in Evidence Synthesis: An Investigation of Methods Employed in Cochrane, Campbell Collaboration, and Environmental Evidence Reviews 证据合成中的人工智能和自动化：Cochrane、Campbell协作和环境证据综述中使用方法的调查

Cochrane Evidence Synthesis and Methods Pub Date : 2025-08-28 DOI: 10.1002/cesm.70046

Kristen L. Scotti, Sarah Young, Melanie A. Gainey, Haoyong Lan

{"title":"Artificial Intelligence and Automation in Evidence Synthesis: An Investigation of Methods Employed in Cochrane, Campbell Collaboration, and Environmental Evidence Reviews","authors":"Kristen L. Scotti, Sarah Young, Melanie A. Gainey, Haoyong Lan","doi":"10.1002/cesm.70046","DOIUrl":"https://doi.org/10.1002/cesm.70046","url":null,"abstract":"<p>Automation, including Machine Learning (ML), is increasingly being explored to reduce the time and effort involved in evidence syntheses, yet its adoption and reporting practices remain under-examined across disciplines (e.g., health sciences, education, and policy). This review assesses the use of automation, including ML-based techniques, in 2271 evidence syntheses published between 2017 and 2024 in the <i>Cochrane Database of Systematic Reviews</i>, and the journals <i>Campbell Systematic Reviews</i>, and <i>Environmental Evidence</i>. We focus on automation across four review steps: search, screening, data extraction, and analysis/synthesis. We systematically identified eligible studies from the three sources and developed a classification system to distinguish between manual, rules-based, ML-enabled, and ML-embedded tools. We then extracted data on tool use, ML integration, reporting practices, motivations for (and against) ML adoption, and the application of stopping criteria for ML-assisted screening. Only ~5% of studies explicitly reported using ML, with most applications limited to screening tasks. Although ~12% employed ML-enabled tools, ~90% of those did not clarify whether ML functionalities were actually utilized. Living reviews showed higher relative ML integration (~15%), but overall uptake remains limited. Previous work has shown that common barriers to broader adoption included limited guidance, low user awareness, and concerns over reliability. Despite ML's potential to streamline evidence syntheses, its integration remains limited and inconsistently reported. Improved transparency, clearer reporting standards, and greater user training are needed to support responsible adoption. As the research literature grows, automation will become increasingly essential—but only if challenges in usability, reproducibility, and trust are addressed.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0