Is Realising Evidence-Based X the Future of Evidence Synthesis?

IF 4 Q1 SOCIAL SCIENCES, INTERDISCIPLINARY

Campbell Systematic Reviews Pub Date : 2025-03-20 DOI:10.1002/cl2.70037

Gavin Stewart

{"title":"Is Realising Evidence-Based X the Future of Evidence Synthesis?","authors":"Gavin Stewart","doi":"10.1002/cl2.70037","DOIUrl":null,"url":null,"abstract":"Evidence synthesis (including systematic reviews and meta-analysis) has a long evolution and has had major impacts across the sciences (Shadish and Lecy 2015), underpinning evidence-informed decision-making, particularly in specific health and social science domains. There has also been some penetration into environmental and climate sciences, but it is not [yet] the primary mechanism for science-policy translation outside health and social sciences. Many methodologists have worked across health or social science domains, and there has long been a realisation that methods harmonisation is beneficial, seen in closer working between Cochrane and Campbell, and the formation of the Society for Research Synthesis methods to foster interdisciplinary learning cf (Stewart and Schmid 2015). Concurrently, the scope of applications has widened considerably, perhaps best exemplified by the global SDG synthesis coalition, who envisage robust evidence synthesis underpinning decisions made across all the sustainable development goals, targets and indicators. This would represent the full realisation of evidence-based X, not evidence-based health or social science or environment – but fully developed generic methodologies applicable irrespective of domain (EBX). To those of us who believe in generic methodologies and the need for coherent decision-making across increasingly complex decision space, this evolution of evidence synthesis is long overdue, but it is not without challenges.Perhaps the three largest barriers to overcome are the plethora of untrustworthy evidence in our publications and databases, lack of coherent whole systems thinking and the difficulties of developing pipelines for methods innovation that maintain rigour.The unpalatable truth that a large fraction (arguably, even the largest fraction) of scientific publications are somewhere between misleading and downright wrong is horrifying and contested by most scientists who are not research methodologists. Most believe that peer review and our publication procedures are adequate to safeguard scientific integrity. They are quite simply wrong! Doug Altman's seminal paper on the scandal of poor medical research from 1994 could be written today in any domain of applied science (Altman 1994). In 2005, John Ioannidis argued that ‘Most published research findings are false’, particularly in fields with large numbers of researchers exploring small effects (Ioannidis 2005). Richard Horton was scathing about progress in 2015 ‘Much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness’. He warned, ‘poor methods get results’ (Horton 2015). There has been some progress, particularly in the field of psychology, with the realisation that questionable research practices were resulting in a ‘replication crisis’ and the subsequent development of the ‘open science’ movement to remedy this, but the problems with evidence remain prevalent. This has important ramifications for EBX.First, scientists need to devote considerable efforts to remedying this situation to ensure that the foundational primary studies are fit for purpose. This will be particularly challenging in domains unfamiliar with formalised evidence synthesis where the need to ensure the integrity of scientific data may not be fully accepted or understood. Endeavours such as Entrust-PE (O'Connell et al. 2024) could be generalised to achieve this. Entrust-PE is an international, interdisciplinary network established to develop an integrated framework for enhancing and facilitating the trustworthiness of pain research by engaging with patients, authors, clinicians, scientists, publishers and funders of science. The primary focus of Entrust-PE is on the conduct of research, but initiatives to develop and promote adherence to reporting standards and the generation of core common outcome sets will also be necessary. Making sure that scientists measure the right things, the right way and report their studies properly is a huge but potentially transformative task.Second, the important roles of evidence synthesis in critically appraising study validity and identifying divergent data must remain central pillars of robust synthesis despite the resource-intensive nature of this work. Effect sizes speak the truth where hacked and harked p values and words do not – as long as they are recalculated, evaluated and contextualised, considering confounders, heterogeneity and publication biases. There are no shortcuts to thoughtful synthesis, be it quantitative or qualitative! This is demanding and skilled work, even with simple outcomes and randomised controlled trials. The high heterogeneity-multiple study design syntheses that comprise most evidence relevant to broader sustainable development goals are much more challenging, requiring a structured appraisal to identify and highlight the massive uncertainties in evidence so often obscured by primary studies, overhyped summaries or conveniently compliant policy briefs.The need to embed evidence synthesis in broader systems to develop effective action in complex contexts or to consider broad policy questions has long been recognised as challenging. The medical model of parameterising health economic models with meta-analysis has served health technology assessment well generally, but systems boundaries are tightly defined in that context, and even there, uncertainties often relate to implementation utilities that are imprecisely known rather than the effects of interventions resulting in incoherence. Beyond that, decision and systems modellers rarely work with meta-analysts, and the qualitative worlds of logic maps rarely meet the quantitative worlds of directed acyclical graphs or Bayesian decision models. Clearly, there is huge scope to ascertain how existing tools can be integrated and develop new ways of thinking in this space to support policy makers in making broad policy decisions. The idea of developing suites of reviews or meta-analyses to underpin one decision has been little explored, although large network meta-analyses can replace many meta-analyses of pairwise comparisons, which is heuristically similar. For example, Birkinshaw et al. (2023) undertook a network meta-analysis of 25 antidepressants across 3 pain conditions in a unified analysis that would traditionally have required > 100 pairwise systematic reviews, none of which would have determined which antidepressant should be the top-ranked treatment for pain. Utilising such methods would also require potentially new ways of working with the policy community to define decision options, utilities and systems boundaries but could effectively direct research to minimising important uncertainties as well as provide mechanisms to support transparent and coherent policy making.Every methodologist will have their own view of research priorities relevant to EBX. The scope is vast, ranging from developing guidance on mixed methods synthesis to generating critical appraisal tools that flag concerns across study designs to expressing uncertainties using threshold analyses or exploring heterogeneity with model averaging routines. Two areas of methodological innovation driven by the need to make broad policy decisions rapidly, namely systematic mapping and rapid evidence assessment, require particularly careful consideration.Breadth is often provided by evidence and gap maps, or topic models generated either using traditional systematic review methods or increasingly artificial intelligence. Such maps provide useful overviews of what evidence is available but critically do not inform policy directly. Scientists have a tendency to pontificate in inverse relation to knowledge, thus twenty studies linking two nodes could have less evidential value than one good study. The temptation to utilise such maps without principled synthesis should be resisted. Berrang-Ford et al. (2021) demonstrate the utility of systematic maps in a principled manner – using AI to generate a large-scale overview – with multiple subsequent systematic reviews used to inform policy.A plethora of methodologies, synthesis products and terminology also surround ‘rapid review’. The term has become meaningless, incorporating everything from undertaking a full systematic review rapidly to abandoning every methodological expectation of a full review to expedite and simplify evidence acquisition and synthesis. Clearly, the latter represent a dangerous, misleading form of research waste despite their prevalence in some domains. Perhaps the most invidious and unjustifiable form of synthesis – vote counting is often employed in this context. The safest course of action for a policy maker presented with evidence generated in this manner is to put it in the bin – whether generated by a human or a machine. Note that this is not to say that good rapid reviews do not exist. Judicious use of shortcuts to expedite rapid synthesis are perfectly germane, provided they transparently communicate the uncertainties inherent in their methods and consider the impact on review findings. Campbell and other coordinating bodies, provide good guidance on using these methods appropriately.Addressing the systemic problems with the generation and synthesis of scientific information may feel intractable, but as authors, reviewers and editors – commissioners or users of science – we can all do our bit. Essentially, do what you can to avoid research waste! Make sure your research is useful and useable and, above all, transparent. Campbell Collaboration authors are well to fore in this endeavour and should be proud of their achievments. The broader challenges in realising the full potential of EBX are non-trivial but so is the reward. The dream of rational decision-making underpinned by a coherent and reliable evidence architecture across all science domains is perhaps closer than ever. Given the incoherence of our current science pipelines and the post-truth political environment, the need for it may also be greater than ever.","PeriodicalId":36698,"journal":{"name":"Campbell Systematic Reviews","volume":"21 2","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cl2.70037","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Campbell Systematic Reviews","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cl2.70037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Evidence synthesis (including systematic reviews and meta-analysis) has a long evolution and has had major impacts across the sciences (Shadish and Lecy 2015), underpinning evidence-informed decision-making, particularly in specific health and social science domains. There has also been some penetration into environmental and climate sciences, but it is not [yet] the primary mechanism for science-policy translation outside health and social sciences. Many methodologists have worked across health or social science domains, and there has long been a realisation that methods harmonisation is beneficial, seen in closer working between Cochrane and Campbell, and the formation of the Society for Research Synthesis methods to foster interdisciplinary learning cf (Stewart and Schmid 2015). Concurrently, the scope of applications has widened considerably, perhaps best exemplified by the global SDG synthesis coalition, who envisage robust evidence synthesis underpinning decisions made across all the sustainable development goals, targets and indicators. This would represent the full realisation of evidence-based X, not evidence-based health or social science or environment – but fully developed generic methodologies applicable irrespective of domain (EBX). To those of us who believe in generic methodologies and the need for coherent decision-making across increasingly complex decision space, this evolution of evidence synthesis is long overdue, but it is not without challenges.

Perhaps the three largest barriers to overcome are the plethora of untrustworthy evidence in our publications and databases, lack of coherent whole systems thinking and the difficulties of developing pipelines for methods innovation that maintain rigour.

The unpalatable truth that a large fraction (arguably, even the largest fraction) of scientific publications are somewhere between misleading and downright wrong is horrifying and contested by most scientists who are not research methodologists. Most believe that peer review and our publication procedures are adequate to safeguard scientific integrity. They are quite simply wrong! Doug Altman's seminal paper on the scandal of poor medical research from 1994 could be written today in any domain of applied science (Altman 1994). In 2005, John Ioannidis argued that ‘Most published research findings are false’, particularly in fields with large numbers of researchers exploring small effects (Ioannidis 2005). Richard Horton was scathing about progress in 2015 ‘Much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness’. He warned, ‘poor methods get results’ (Horton 2015). There has been some progress, particularly in the field of psychology, with the realisation that questionable research practices were resulting in a ‘replication crisis’ and the subsequent development of the ‘open science’ movement to remedy this, but the problems with evidence remain prevalent. This has important ramifications for EBX.

First, scientists need to devote considerable efforts to remedying this situation to ensure that the foundational primary studies are fit for purpose. This will be particularly challenging in domains unfamiliar with formalised evidence synthesis where the need to ensure the integrity of scientific data may not be fully accepted or understood. Endeavours such as Entrust-PE (O'Connell et al. 2024) could be generalised to achieve this. Entrust-PE is an international, interdisciplinary network established to develop an integrated framework for enhancing and facilitating the trustworthiness of pain research by engaging with patients, authors, clinicians, scientists, publishers and funders of science. The primary focus of Entrust-PE is on the conduct of research, but initiatives to develop and promote adherence to reporting standards and the generation of core common outcome sets will also be necessary. Making sure that scientists measure the right things, the right way and report their studies properly is a huge but potentially transformative task.

Second, the important roles of evidence synthesis in critically appraising study validity and identifying divergent data must remain central pillars of robust synthesis despite the resource-intensive nature of this work. Effect sizes speak the truth where hacked and harked p values and words do not – as long as they are recalculated, evaluated and contextualised, considering confounders, heterogeneity and publication biases. There are no shortcuts to thoughtful synthesis, be it quantitative or qualitative! This is demanding and skilled work, even with simple outcomes and randomised controlled trials. The high heterogeneity-multiple study design syntheses that comprise most evidence relevant to broader sustainable development goals are much more challenging, requiring a structured appraisal to identify and highlight the massive uncertainties in evidence so often obscured by primary studies, overhyped summaries or conveniently compliant policy briefs.

The need to embed evidence synthesis in broader systems to develop effective action in complex contexts or to consider broad policy questions has long been recognised as challenging. The medical model of parameterising health economic models with meta-analysis has served health technology assessment well generally, but systems boundaries are tightly defined in that context, and even there, uncertainties often relate to implementation utilities that are imprecisely known rather than the effects of interventions resulting in incoherence. Beyond that, decision and systems modellers rarely work with meta-analysts, and the qualitative worlds of logic maps rarely meet the quantitative worlds of directed acyclical graphs or Bayesian decision models. Clearly, there is huge scope to ascertain how existing tools can be integrated and develop new ways of thinking in this space to support policy makers in making broad policy decisions. The idea of developing suites of reviews or meta-analyses to underpin one decision has been little explored, although large network meta-analyses can replace many meta-analyses of pairwise comparisons, which is heuristically similar. For example, Birkinshaw et al. (2023) undertook a network meta-analysis of 25 antidepressants across 3 pain conditions in a unified analysis that would traditionally have required > 100 pairwise systematic reviews, none of which would have determined which antidepressant should be the top-ranked treatment for pain. Utilising such methods would also require potentially new ways of working with the policy community to define decision options, utilities and systems boundaries but could effectively direct research to minimising important uncertainties as well as provide mechanisms to support transparent and coherent policy making.

Every methodologist will have their own view of research priorities relevant to EBX. The scope is vast, ranging from developing guidance on mixed methods synthesis to generating critical appraisal tools that flag concerns across study designs to expressing uncertainties using threshold analyses or exploring heterogeneity with model averaging routines. Two areas of methodological innovation driven by the need to make broad policy decisions rapidly, namely systematic mapping and rapid evidence assessment, require particularly careful consideration.

Breadth is often provided by evidence and gap maps, or topic models generated either using traditional systematic review methods or increasingly artificial intelligence. Such maps provide useful overviews of what evidence is available but critically do not inform policy directly. Scientists have a tendency to pontificate in inverse relation to knowledge, thus twenty studies linking two nodes could have less evidential value than one good study. The temptation to utilise such maps without principled synthesis should be resisted. Berrang-Ford et al. (2021) demonstrate the utility of systematic maps in a principled manner – using AI to generate a large-scale overview – with multiple subsequent systematic reviews used to inform policy.

A plethora of methodologies, synthesis products and terminology also surround ‘rapid review’. The term has become meaningless, incorporating everything from undertaking a full systematic review rapidly to abandoning every methodological expectation of a full review to expedite and simplify evidence acquisition and synthesis. Clearly, the latter represent a dangerous, misleading form of research waste despite their prevalence in some domains. Perhaps the most invidious and unjustifiable form of synthesis – vote counting is often employed in this context. The safest course of action for a policy maker presented with evidence generated in this manner is to put it in the bin – whether generated by a human or a machine. Note that this is not to say that good rapid reviews do not exist. Judicious use of shortcuts to expedite rapid synthesis are perfectly germane, provided they transparently communicate the uncertainties inherent in their methods and consider the impact on review findings. Campbell and other coordinating bodies, provide good guidance on using these methods appropriately.

Addressing the systemic problems with the generation and synthesis of scientific information may feel intractable, but as authors, reviewers and editors – commissioners or users of science – we can all do our bit. Essentially, do what you can to avoid research waste! Make sure your research is useful and useable and, above all, transparent. Campbell Collaboration authors are well to fore in this endeavour and should be proud of their achievments. The broader challenges in realising the full potential of EBX are non-trivial but so is the reward. The dream of rational decision-making underpinned by a coherent and reliable evidence architecture across all science domains is perhaps closer than ever. Given the incoherence of our current science pipelines and the post-truth political environment, the need for it may also be greater than ever.

查看原文本刊更多论文

实现循证X是证据合成的未来吗？

证据综合（包括系统评价和荟萃分析）经历了漫长的演变，并对整个科学产生了重大影响（Shadish和Lecy， 2015年），支撑了基于证据的决策，特别是在特定的卫生和社会科学领域。环境和气候科学也有一些渗透，但它[还]不是卫生和社会科学以外的科学政策转化的主要机制。许多方法学家都在健康或社会科学领域工作，长期以来，人们一直意识到方法协调是有益的，这可以从Cochrane和Campbell之间更密切的合作中看到，以及研究综合方法协会的形成以促进跨学科学习cf （Stewart and Schmid 2015）。与此同时，应用范围也大大扩大，最好的例子可能是全球可持续发展目标综合联盟，该联盟设想了强有力的证据综合，为所有可持续发展目标、具体目标和指标的决策提供支持。这将代表充分实现基于证据的X，而不是基于证据的健康或社会科学或环境，而是完全开发的适用于任何领域的通用方法（EBX）。对于我们这些相信通用方法和需要在日益复杂的决策空间中进行连贯决策的人来说，证据综合的这种演变早就应该出现了，但它并非没有挑战。也许需要克服的三个最大障碍是：我们的出版物和数据库中过多的不可信证据，缺乏连贯的整体系统思维，以及开发保持严谨的方法创新管道的困难。很大一部分（可以说，甚至是最大的一部分）科学出版物介于误导和彻头彻尾的错误之间，这一令人不快的事实令人震惊，并受到大多数非研究方法学家的科学家的质疑。大多数人认为同行评议和我们的出版程序足以保障科学诚信。他们完全错了！道格·奥特曼（Doug Altman）关于1994年不良医学研究丑闻的开创性论文今天可以在应用科学的任何领域写作（Altman 1994）。2005年，John Ioannidis认为“大多数发表的研究结果都是错误的”，特别是在有大量研究人员探索小影响的领域（Ioannidis 2005）。理查德·霍顿（Richard Horton）对2015年的进展进行了严厉的批评：“大部分科学文献，也许是一半，可能根本就是不真实的。受小样本量、微小效应、无效的探索性分析和公然的利益冲突的困扰，加上对追求重要性可疑的时尚趋势的痴迷，科学已经转向黑暗。他警告说，“糟糕的方法会产生结果”（Horton 2015）。已经有了一些进展，特别是在心理学领域，意识到有问题的研究实践导致了“复制危机”，以及随后的“开放科学”运动的发展来补救这一点，但是证据的问题仍然普遍存在。这对EBX有重要的影响。首先，科学家需要付出相当大的努力来纠正这种情况，以确保基础的初级研究符合目的。这在不熟悉形式化证据综合的领域尤其具有挑战性，因为在这些领域，确保科学数据完整性的必要性可能不被完全接受或理解。像trust- pe （O'Connell et al. 2024）这样的努力可以推广到实现这一目标。trust- pe是一个国际性的跨学科网络，旨在通过与患者、作者、临床医生、科学家、出版商和科学资助者的接触，开发一个综合框架，以加强和促进疼痛研究的可信度。委托-私人投资的主要重点是进行研究，但也需要制定和促进遵守报告标准和产生核心共同成果集的倡议。确保科学家以正确的方式测量正确的东西，并正确地报告他们的研究是一项巨大但可能具有变革性的任务。其次，证据综合在批判性评估研究有效性和识别分歧数据方面的重要作用，必须继续成为强有力的综合的核心支柱，尽管这项工作具有资源密集型的性质。效应值能说明问题，而p值和文字不能说明问题——只要它们经过重新计算、评估和背景化，考虑混杂因素、异质性和发表偏差。深思熟虑的综合没有捷径，无论是定量的还是定性的！即使是简单的结果和随机对照试验，这也是一项要求高、技术含量高的工作。 Campbell Collaboration的作者在这一努力中走在前列，应该为他们的成就感到自豪。在实现EBX的全部潜力方面，更广泛的挑战并非微不足道，但回报也是如此。以贯穿所有科学领域的连贯可靠的证据架构为基础的理性决策的梦想可能比以往任何时候都更接近。鉴于我们目前的科学管道不连贯，以及后真相时代的政治环境，对它的需求可能也比以往任何时候都更大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊