Shaun Treweek, Declan Devane, Vivian Welch, Jennifer Petkovic, Peter Tugwell, K M Saif-Ur-Rahman, Ana Beatriz Pizarro, Agustín Ciapponi, Ioanna Gkertso, Clarinda Cerejo, Hanne Bruhn
{"title":"PRO EDI-A Tool to Help Systematic Reviewers Make Equity, Diversity, and Inclusion Assessments.","authors":"Shaun Treweek, Declan Devane, Vivian Welch, Jennifer Petkovic, Peter Tugwell, K M Saif-Ur-Rahman, Ana Beatriz Pizarro, Agustín Ciapponi, Ioanna Gkertso, Clarinda Cerejo, Hanne Bruhn","doi":"10.1002/cesm.70083","DOIUrl":"https://doi.org/10.1002/cesm.70083","url":null,"abstract":"<p><strong>Introduction: </strong>Decisions need evidence, and for healthcare decisions, the evidence decision-makers often want is a systematic review. However, reviews often lack clarity about who is represented within the evidence they synthesize, which limits understanding of how findings apply to diverse populations. PRO EDI was developed to help systematic review authors extract and report equity-related participant data to support greater transparency and more informed judgments about applicability.</p><p><strong>Methods: </strong>PRO EDI was developed iteratively between August 2022 and March 2024 and was conceptualized as a way of making it easier to use PROGRESS-Plus, a framework to assess equity in reviews. An initial draft was created and then discussed and revised in collaboration with an international advisory group. A relatively mature version of the tool was then presented to a meeting of the Cochrane Health Equity Thematic Group. The modified version that emerged from that meeting was considered v1 of PRO EDI.</p><p><strong>Results: </strong>PRO EDI has two main components: a participant characteristics table and guidance on how to use the extracted characteristics data within reviews. PRO EDI recommends that six participant characteristics should be extracted for all included studies in a review: age, sex, gender, ethnicity, race and ancestry, socioeconomic status, and location. Other characteristics (e.g., disability) may be important for some reviews. PRO EDI is relevant for all systematic reviews, not just those with an equity focus. The tool has been piloted in several reviews and is publicly available via Trial Forge.</p><p><strong>Conclusion: </strong>PRO EDI gives systematic review authors a consistent way of deciding which participant characteristics to extract from included studies to support equity-related judgments in their results and discussion. It also suggests ways in which those judgments can be presented.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":"e70083"},"PeriodicalIF":0.0,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13131102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147825544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petter Fagerberg, Oscar Sallander, Kim Vikhe Patil, Anders Berg, Anastasia Nyman, Natalia Borg, Thomas Lindén
{"title":"Batch Size Effects on Mid-2025 State-of-the-Art Large Language Model Performance in Automated Title and Abstract Screening","authors":"Petter Fagerberg, Oscar Sallander, Kim Vikhe Patil, Anders Berg, Anastasia Nyman, Natalia Borg, Thomas Lindén","doi":"10.1002/cesm.70082","DOIUrl":"10.1002/cesm.70082","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Manual abstract screening is a primary bottleneck in evidence synthesis. Emerging evidence suggests that large language models (LLMs) can automate this task, but their performance when processing multiple references simultaneously in “batches” is uncertain.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Objectives</h3>\u0000 \u0000 <p>To evaluate the classification performance of four state-of-the-art LLMs (Gemini 2.5 Pro, Gemini 2.5 Flash, GPT-5, and GPT-5 mini) in predicting reference eligibility across a wide range of batch sizes for a systematic review of randomized controlled trials.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We used a gold-standard dataset of 790 references (93 considered relevant) from a published Cochrane Review on stem cell treatment for acute myocardial infarction. Using the public APIs for each model, batches of 1 to 790 references were submitted to classify each as “Include” or “Exclude.” Performance was assessed using sensitivity and specificity, with internal validation conducted through 10 repeated runs for each model-batch combination.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Gemini 2.5 Pro was the most robust model, successfully processing the full 790-reference batch. In contrast, GPT-5 failed at batches ≥400, while GPT-5 mini and Gemini 2.5 Flash failed at the 790-reference batch. Overall, all models demonstrated strong performance within their operational ranges, with two notable exceptions: Gemini 2.5 Flash showed low initial sensitivity at batch 1, and GPT-5 mini's sensitivity degraded at higher batch sizes (from 0.88 at batch 200 to 0.48 at batch 400). At a practical batch size of 100, Gemini 2.5 Pro achieved the highest sensitivity (1.00, 95% CI 1.00–1.00), whereas GPT-5 delivered the highest specificity (0.98, 95% CI 0.98–0.98).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>State-of-the-art LLMs can effectively screen multiple abstracts per prompt, moving beyond inefficient single-reference processing. However, performance is model-dependent, revealing trade-offs between sensitivity and specificity. Therefore, batch size optimization and strategic model selection are important parameters for successful implementation.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13073229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147694607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Do We Need Systematic Reviews of Research Priority Setting? A Proposal for a New Concept on Conducting Systematic Reviews of Research Priority Setting Exercises","authors":"Mona Nasser, Sumanth Kumbargere Nagraj, Seilin Uhm, Prashanti Eachempati, Soumyadeep Bhaumik","doi":"10.1002/cesm.70079","DOIUrl":"10.1002/cesm.70079","url":null,"abstract":"<p>With the increasing number of research priority setting (RPS) exercises, systematic reviews synthesising their findings have also grown in prevalence. While these reviews offer a structured way to compare methodologies, identify underrepresented stakeholder groups, and guide funding decisions, conventional systematic review methodologies, designed primarily for clinical and health research, often fail to capture the complexity, contextual nuance, and participatory nature of RPS. In this commentary, we critically examine these limitations and propose methodological adaptations to enhance the relevance and utility of systematic reviews of RPS. Beyond knowledge generation, we highlight the broader implications of RPS, including its role in stakeholder engagement, research funding allocation, and policy translation, as well as its impact on how these exercises are synthesised. By re-evaluating how systematic reviews of RPS are conducted, we advocate for context-sensitive methodologies that better reflect the dynamic and iterative nature of research priority setting.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147708430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Howard Lopes Ribeiro Junior, Francisco Washington Araújo Barros Nepomuceno, Cláudia do Ó Pessoa, Mauer Alexandre da Ascensão Gonçalves
{"title":"Systematic Reviews in the Age of AI: Are We Sacrificing Rigor for Volume?","authors":"Howard Lopes Ribeiro Junior, Francisco Washington Araújo Barros Nepomuceno, Cláudia do Ó Pessoa, Mauer Alexandre da Ascensão Gonçalves","doi":"10.1002/cesm.70080","DOIUrl":"10.1002/cesm.70080","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Systematic reviews occupy a central position in evidence hierarchies, providing structured syntheses intended to inform clinical decision-making and health policy. However, the rapid expansion of artificial intelligence (AI) tools in literature searching, screening, data extraction, and manuscript drafting is transforming how these reviews are produced. Concurrently, the number of prospectively registered systematic reviews has grown substantially, with recent increases in PROSPERO registrations highlighting an accelerating output of evidence syntheses. While technological advances promise efficiency and scalability, they also raise concerns regarding methodological rigor, redundancy, and transparency.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>This viewpoint argues that the current reporting and governance frameworks for systematic reviews remain largely anchored in pre-AI workflows.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Ongoing updates to reporting standards, including PRISMA revisions, have yet to fully address key challenges introduced by AI-assisted methodologies, such as algorithmic bias, auditability, reproducibility limitations of proprietary models, and the need to document human oversight. The absence of explicit guidance for reporting AI use creates a critical transparency gap, potentially undermining confidence in systematic reviews and increasing the risk of superficial or duplicated syntheses.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>We propose that the evidence-synthesis ecosystem requires urgent adaptation, including the development of a PRISMA-AI extension, strengthened metadata requirements in registries such as PROSPERO, and updated editorial policies for AI-assisted reviews. Safeguarding rigor in the age of automated science is essential to maintain the credibility and clinical utility of systematic reviews.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70080","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147708401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan P. W. Kenny, Katie Twentyman, Dawn Craig, Nick Meader, Gill Norman
{"title":"The Impact on Systematic Reviews of Risk of Bias Assessment Changes From Conference Abstracts to Full Text","authors":"Ryan P. W. Kenny, Katie Twentyman, Dawn Craig, Nick Meader, Gill Norman","doi":"10.1002/cesm.70078","DOIUrl":"https://doi.org/10.1002/cesm.70078","url":null,"abstract":"<p>Conference abstracts are commonly included in systematic reviews of evidence. Due to limitations in word count, conference abstracts often lack data or information. This causes issues for the assessment of risk of bias (RoB). We therefore aimed to compare the RoB rating, using the Cochrane RoB tool, for abstracts and full texts. This was accomplished using previously published Cochrane reviews and comparing RoB ratings for included studies originally included as an abstract and later as full text. To accomplish this, we searched the Cochrane Database of Systematic Reviews for reviews with updates across numerous disciplines (depression, anxiety, surgical, Parkinson's disease, Alzheimer's disease, multiple sclerosis, motor neuron disease, cancer, cardiovascular disease, and musculoskeletal disease). We identified 29 reviews, with 52 randomized controlled trials included, which had an abstract and subsequent full text available. If abstracts and full texts were not assessed using the Cochrane RoB tool, we obtained the texts and performed the assessment (<i>n</i> = 32). To assess the likelihood of changing the domain assessment rating (low, unclear, or high) from conference abstract to full text, we performed a Bayesian categorical multinomial model for each domain (i.e., signaling question) of the Cochrane tool. At the abstract assessment stage, the most common decision was unclear. Using unclear as the reference level in the model led to increased odds of being rated high at full text, compared to abstract assessment, for domains 2 (allocation concealment: odds ratio [OR] = 3.09, 95% credible intervals (CrI) 1.01 to 9.84) and 3 (blinding: OR = 5.09, 95% CrI 1.67 to 16.20). Domain 2 also had odds of being rated low (OR: 2.93, 95% CrI: 1.13 to 7.87). This suggests an impact of changing conference abstract to full text assessments on RoB. The numerous unclear ratings observed at the abstract assessment were usually due to a lack of reporting. While the findings of this study should be interpreted within the context of small numbers, the evidence still suggests that, in some instances, such as allocation concealment and blinding, it is likely that the decision could change based on full-text assessment. This also has implications for the certainty of the evidence, which is impacted by the RoB assessments, with having abstracts only or full texts available potentially changing the overall certainty. Current RoB tools may not be suitable for assessing conference abstracts.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147615379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan P. W. Kenny, Katie Twentyman, Dawn Craig, Nick Meader, Gill Norman
{"title":"The Impact on Systematic Reviews of Risk of Bias Assessment Changes From Conference Abstracts to Full Text","authors":"Ryan P. W. Kenny, Katie Twentyman, Dawn Craig, Nick Meader, Gill Norman","doi":"10.1002/cesm.70078","DOIUrl":"https://doi.org/10.1002/cesm.70078","url":null,"abstract":"<p>Conference abstracts are commonly included in systematic reviews of evidence. Due to limitations in word count, conference abstracts often lack data or information. This causes issues for the assessment of risk of bias (RoB). We therefore aimed to compare the RoB rating, using the Cochrane RoB tool, for abstracts and full texts. This was accomplished using previously published Cochrane reviews and comparing RoB ratings for included studies originally included as an abstract and later as full text. To accomplish this, we searched the Cochrane Database of Systematic Reviews for reviews with updates across numerous disciplines (depression, anxiety, surgical, Parkinson's disease, Alzheimer's disease, multiple sclerosis, motor neuron disease, cancer, cardiovascular disease, and musculoskeletal disease). We identified 29 reviews, with 52 randomized controlled trials included, which had an abstract and subsequent full text available. If abstracts and full texts were not assessed using the Cochrane RoB tool, we obtained the texts and performed the assessment (<i>n</i> = 32). To assess the likelihood of changing the domain assessment rating (low, unclear, or high) from conference abstract to full text, we performed a Bayesian categorical multinomial model for each domain (i.e., signaling question) of the Cochrane tool. At the abstract assessment stage, the most common decision was unclear. Using unclear as the reference level in the model led to increased odds of being rated high at full text, compared to abstract assessment, for domains 2 (allocation concealment: odds ratio [OR] = 3.09, 95% credible intervals (CrI) 1.01 to 9.84) and 3 (blinding: OR = 5.09, 95% CrI 1.67 to 16.20). Domain 2 also had odds of being rated low (OR: 2.93, 95% CrI: 1.13 to 7.87). This suggests an impact of changing conference abstract to full text assessments on RoB. The numerous unclear ratings observed at the abstract assessment were usually due to a lack of reporting. While the findings of this study should be interpreted within the context of small numbers, the evidence still suggests that, in some instances, such as allocation concealment and blinding, it is likely that the decision could change based on full-text assessment. This also has implications for the certainty of the evidence, which is impacted by the RoB assessments, with having abstracts only or full texts available potentially changing the overall certainty. Current RoB tools may not be suitable for assessing conference abstracts.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147615381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephanie Weibel, Patrick Meybohm, Annika Oeser, Maria Popp, Tamara Pscheidl, Stefanie Reis, Lena Saal-Bauernschubert, Stephanie Stangl, Emma Sydenham, Carina Wagner, Florencia Weber, Ana-Mihaela Zorger, Nicole Skoetz
{"title":"Impact of a Research Integrity Assessment (RIA) of Randomized Controlled Trials Included in Interventional COVID-19 Systematic Reviews: A Meta-Epidemiological Study","authors":"Stephanie Weibel, Patrick Meybohm, Annika Oeser, Maria Popp, Tamara Pscheidl, Stefanie Reis, Lena Saal-Bauernschubert, Stephanie Stangl, Emma Sydenham, Carina Wagner, Florencia Weber, Ana-Mihaela Zorger, Nicole Skoetz","doi":"10.1002/cesm.70076","DOIUrl":"https://doi.org/10.1002/cesm.70076","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Objective</h3>\u0000 \u0000 <p>This study aimed to evaluate the feasibility, reliability, and impact of the Research Integrity Assessment (RIA) tool when applied to randomized controlled trials (RCTs) included in systematic reviews. RIA is a structured tool designed to assess retractions, trial registration, ethical approval, authorship, and plausibility of methods and results, thereby identifying RCTs that may not meet basic standards of research integrity.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Design</h3>\u0000 \u0000 <p>Meta-epidemiological study.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We systematically identified Cochrane reviews and non-Cochrane systematic reviews of RCTs investigating interventions for COVID-19 and extracted all RCTs. Each RCT was independently assessed by two reviewers (with different expertise in evidence synthesis) using the RIA tool, with disagreements resolved by a senior reviewer. Reliability and feasibility were recorded, and sensitivity analyses examined the impact of excluding studies failing the RIA on meta-analytic results.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Two hundred six RCTs from 23 Cochrane reviews and non-Cochrane systematic reviews were assessed with RIA. Fifty-nine RCTs (29%) were excluded due to integrity concerns, 79 (38%) classified as “awaiting classification”, and 11 (5%) identified as non-randomized studies, leaving 57 RCTs (28%) rated as “no concern.” The most common reason for exclusion was absent or retrospective trial registration, while uncertainties around ethics approval were the main reason for “awaiting classification”. Interrater reliability was moderate overall (κ = 0.5), with higher agreement in objective domains and lower in domains requiring interpretive judgment, necessitating senior adjudication in a substantial proportion of assessments. On average, application of RIA required 21–27 min per RCT; however, the time required for senior assessor reassessment, conflict resolution, and author correspondence was not systematically recorded and substantially exceeded that of the initial assessments. We received 35 author responses to 165 individual queries. Sensitivity analyses restricted to RCTs passing RIA reduced the median number of eligible RCTs per meta-analysis by 60%. This frequently widened confidence intervals and decreased the certainty of conclusions, although the direction of effect estimates changed only rarely.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>These result","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70076","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147562396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Shaughnessy, Vijay Joshi, Natalia Dellavalle, Louis Leslie, Michael Edwards, Timothy Waxweiler, Tianjing Li, Riaz Qureshi
{"title":"Associations of Social and Demographic Factors on the Outcomes of Ocular Melanoma and Other Adult Ocular Neoplasms in the United States: A Systematic Review","authors":"Daniel Shaughnessy, Vijay Joshi, Natalia Dellavalle, Louis Leslie, Michael Edwards, Timothy Waxweiler, Tianjing Li, Riaz Qureshi","doi":"10.1002/cesm.70075","DOIUrl":"10.1002/cesm.70075","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Social determinants of health (SDOH), including economic stability, education access and quality, healthcare access and quality, neighborhood and built environment, and social and community context, shape gaps in health outcomes across many conditions. Ocular neoplasms are no exception. Cancers such as uveal melanoma, conjunctival squamous cell carcinoma, ocular lymphoma, and ocular Kaposi sarcoma may be especially vulnerable to social and demographic influences. We systematically reviewed documented associations between SDOH and these ocular cancers in the United States.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Following a pre-registered protocol, we searched MEDLINE, Embase, and Web of Science (from January 2000 to November 2023) for primary studies of any design that evaluated one or more relationships between SDOH and outcomes related to the ocular cancers listed above. Outcomes included cancer incidence, stage at diagnosis, treatment patterns, survival, and mortality. We extracted study design, population, exposure, and outcome characteristics, classified each exposure-outcome association by its direction (e.g., favorable, unfavorable, or null), and assessed the risk of bias using a modified Newcastle-Ottawa Scale. Due to heterogeneity in exposure and outcome definitions, we narratively synthesized findings by SDOH domain.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>We included 21 studies examining 167 unique associations. Social and community context, typically represented as race and ethnicity, was the most frequently studied domain, followed by economic stability (e.g., income) and healthcare access and quality (e.g., insurance type or travel distance). Across domains, lower socioeconomic status, public or no insurance, minority racial and ethnic identity, and care at academic centers generally are associated with later stage at diagnosis, higher odds of enucleation, or worse survival. Higher income, private insurance, and treatment at experienced facilities were often associated to earlier presentation and better outcomes.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>SDOH have a measurable and often unfavorable relationship with the diagnosis, management, and prognosis of rare adult ocular cancers in the United States. Standardized SDOH exposures and measurements, prospective data collection, and adjustment for confounding are necessary to strengthen the evidence and guide multi-domain interventions (e.g., expanded insurance, travel assistance to high-volume centers, and community eye-he","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12977123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147446569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susanne Hempel, Kimny Sysawang, Haley K. Holmer, Erin Tokutomi, Suchitra Iyer, Zhen Wang, Edi Kuhn, Mohammad Hassan Murad
{"title":"Using Large Language Models to Address Contextual Questions in Systematic Reviews","authors":"Susanne Hempel, Kimny Sysawang, Haley K. Holmer, Erin Tokutomi, Suchitra Iyer, Zhen Wang, Edi Kuhn, Mohammad Hassan Murad","doi":"10.1002/cesm.70060","DOIUrl":"10.1002/cesm.70060","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Objectives</h3>\u0000 \u0000 <p>Systematic evidence reviews (SERs) produced by the U.S. Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Center (EPC) Program use contextual questions to provide context and background information on the topic. There is currently no standardized approach to address contextual questions in systematic reviews. This study explored the use of publicly available large language models (LLMs) in addressing contextual questions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Study Design</h3>\u0000 \u0000 <p>Using a set of 20 published and 5 yet to be published SERs, we selected one contextual question per report and used it as a prompt to elicit answers from an LLM (ChatGPT, Bard, Claude, or Perplexity). Two independent reviewers rated the results using a priori established evaluation criteria (https://osf.io/4k3cu/), comparing the response in the SER to LLM-generated responses. The study was guided by six research questions addressing feasibility, validity of content, validity of structure, mistakes, congruence between responses, and incremental validity of using LLMs to address contextual questions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Using minimal prompt engineering produced relevant responses and documented the feasibility of LLM-generated answers to contextual questions. Responses differed in content and format and are not reproducible (e.g., LLMs update regularly), but LLMs were able to produce articulate, clinically plausible, and well-structured responses. We detected few factual errors, contradictions, and no instance of suspected bias, but citations supporting LLM-generated responses could often not be produced or could not be verified (‘confabulations’). Congruence with human generated responses varied, with LLM-generated responses providing more background on the topic and SERs providing more nuanced answers in response to the contextual question. Results regarding incremental validity were mixed and may depend on the tool.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>LLMs are potentially helpful in addressing contextual questions in systematic reviews but human expertise remains essential for using the generated information in a meaningful way.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12948247/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to “Health Equity in Systematic Reviews: A Tutorial—Part 1 Getting Started With Health Equity in Your Review”, “Health Equity in Systematic Reviews: A Tutorial—Part 2 Implementing Health Equity Throughout Your Methods”, “Meta-Analysis Using Time-to-Event Data: A Tutorial” and “Split Body Trials in Systematic Reviews and Meta-Analyses: A Tutorial”","authors":"","doi":"10.1002/cesm.70062","DOIUrl":"10.1002/cesm.70062","url":null,"abstract":"<p>J. Petkovic, J. P. Pardo, V. Welch, et al., “Health Equity in Systematic Reviews: A Tutorial—Part 1 Getting Started With Health Equity in Your Review,” <i>Cochrane Evidence Synthesis and Methods</i> 3 (2025): 1–6, https://doi.org/10.1002/cesm.70055.</p><p>J. Petkovic, J. P. Pardo, V. Welch, et al., “Health Equity in Systematic Reviews: A Tutorial—Part 2 Implementing Health Equity Throughout Your Methods,” <i>Cochrane Evidence Synthesis and Methods</i> 3 (2025): 1–7, https://doi.org/10.1002/cesm.70054.</p><p>Krishan, A. and Dwan, K. (2025), Meta-Analysis Using Time-to-Event Data: A Tutorial. Cochrane Evidence Synthesis and Methods, 3: e70041. https://doi.org/10.1002/cesm.70041</p><p>Livingstone, N., Dwan, K. and Chaplin, M. (2025), Split Body Trials in Systematic Reviews and Meta-Analyses: A Tutorial. Cochrane Evidence Synthesis and Methods, 3: e70052. https://doi.org/10.1002/cesm.70052</p><p>The two articles by Petkovic <i>et al.</i> (<span>1</span>) were incorrectly labelled as “Methods Article”. They have now been corrected to “Tutorial”.</p><p>In Petkovic <i>et al.</i> (<span>1</span>), Appendix A, the row for ‘Religion’ states “This refers to the religious beliefs and can affect health equity when the choices related to these beliefs are imposed on a person by their family or community”. This should state “This refers to religious beliefs, which can influence health and well-being by fostering community support and shared values, and may also affect health equity (positively and negatively) when choices related to these beliefs are shaped by family or community expectations”.</p><p>In Petkovic <i>et al.</i> (<span>2</span>), the following Conflict of Interest statement was missing:</p><p><b>CONFLICT OF INTEREST STATEMENT</b></p><p>All authors are members of the leadership team of the Cochrane Health Equity Thematic Group. The authors have no other conflicts to declare.</p><p>The article by Krishan & Dwan (<span>3</span>) was incorrectly labelled as “Brief Report”. This has now been corrected to “Tutorial”.</p><p>The article by Livingstone, Dwan & Chaplin (<span>4</span>) was incorrectly labelled as “Editorial”. This has now been corrected to “Tutorial”.</p><p>These have now been corrected in the published articles.</p><p>We apologize for these errors.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12919368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147273839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}