Validation of the Quality Assessment Tool for Systematic Reviews and Meta-Analyses of Real-World Studies

IF 3.6 2区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Tadesse Gebrye, Chidozie Mbada, Zalmai Hakimi, Francis Fatoye
{"title":"Validation of the Quality Assessment Tool for Systematic Reviews and Meta-Analyses of Real-World Studies","authors":"Tadesse Gebrye,&nbsp;Chidozie Mbada,&nbsp;Zalmai Hakimi,&nbsp;Francis Fatoye","doi":"10.1111/jebm.70052","DOIUrl":null,"url":null,"abstract":"<p>Randomized controlled trials (RCTs) are considered the gold standard for assessing the efficacy of medical interventions [<span>1</span>]. However, real-world evidence (RWE) is increasingly recognized as essential for comprehensive healthcare decision-making. RCTs provide high internal validity and establish clear causal relationships due to their controlled environments and strict criteria. Nevertheless, the highly selective patient populations and controlled settings of RCTs can limit the external validity of their findings, making it challenging to generalize results to broader, more diverse populations [<span>2</span>]. RWE is derived from real-world data (RWD), such as electronic health records and insurance claims, and provides clinical insights into the usage, benefits, and risks of medical products. Unlike RCTs, RWE offers perspectives on treatment performance in everyday practice, which can significantly aid healthcare decision-making [<span>3</span>]. RWD serves to bridge the gap between clinical trials and real-world settings, informing guidelines, policy decisions, and new therapy approvals [<span>4</span>]. This type of evidence captures a wider range of patient populations and healthcare environments, making it particularly valuable for understanding the effectiveness, safety, and cost-effectiveness of interventions in real-world conditions.</p><p>Regulatory bodies and healthcare organizations increasingly rely on RWE to fill gaps left by RCTs [<span>5</span>]. For instance, the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have incorporated RWE to support regulatory decisions and postmarket surveillance [<span>6</span>]. When making healthcare recommendations, it is crucial that they are grounded in the best available research evidence [<span>7</span>]. Incorporating this evidence into healthcare practices can help reduce variations in healthcare delivery. The volume of research studies on healthcare is now enormous for healthcare professionals. RWE is instrumental in understanding the effectiveness and safety of interventions across diverse populations and in identifying rare adverse events and long-term outcomes, thus enhancing healthcare practices and policies [<span>8</span>].</p><p>To summarize and present the findings of individual research studies a structured approach is required. This structured approach, systematic review, provides a comprehensive and unbiased synthesis of many relevant studies in a single document. One of the most critical components of conducting a systematic review is the assessment of the quality of the included studies, as this significantly impacts the overall quality of evidence produced [<span>9</span>]. Quality appraisal refers to evaluating how well a study was designed and conducted looking at its methodological soundness, such as whether it used an appropriate study design, followed rigorous procedures, and addressed key elements like sample selection and data analysis [<span>9</span>]. In contrast, risk of bias assessment focuses specifically on identifying systematic errors that may distort the study's findings, such as selection bias, measurement bias, or confounding.</p><p>A recent scoping review highlighted a significant gap in the availability of methodological quality appraisal tools specifically designed for systematic reviews (SRs) and meta-analyses (MAs) involving real-world evidence (RWE) studies [<span>10</span>]. In the absence of such tailored instruments, researchers have commonly relied on general tools not originally developed with RWE in mind, such as the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies, the Critical Appraisal Skills Programme (CASP) Checklist, the Newcastle-Ottawa Scale (NOS), the Non-Summative Four-Point System, the Quality of Health Economic Studies Instrument, the STROBE Statement, and the Joanna Briggs Institute Critical Appraisal Tool for Prevalence Studies. While these tools offer useful frameworks for evaluating traditional observational studies, they may not adequately account for the unique methodological features and data heterogeneity characteristic of real-world studies. Unlike traditional observational research, which often relies on prospectively collected data from controlled research settings or cohorts, RWE studies draw on routinely collected data from clinical practice such as electronic health records, insurance claims, and patient registries that were not originally intended for research purposes, introducing complexities that existing appraisal tools may not fully address [<span>11</span>].</p><p>In response to this methodological gap, a novel instrument the Quality Assessment Tool for Systematic Reviews and Meta-Analyses Involving Real-World Studies (QATSM-RWS) has been developed [<span>11</span>]. QATSM-RWS is specifically designed to assess the methodological quality of SRs and MAs that synthesize data derived from real-world settings, such as electronic health records, insurance claims, patient registries, and other routinely collected healthcare data. Validating QATSM-RWS is a critical step to establish its reliability and relevance in assessing the quality of evidence generated from RWE. The present study aims to assess the interrater agreement of QATSM-RWS in comparison to existing quality assessment tools to ensure the consistency and reliability of assessments across different evaluators.</p><p>Fifteen SRs and meta-analyses on RWE studies were selected from a relevant database using a purposive sampling technique (Table S1). The selected studies focusing on musculoskeletal disease as a reference health condition were identified from a scoping review on quality assessment tools used in systematic reviews and meta-analyses of RWE studies by Gebrye and colleagues [<span>10</span>]. Two quality assessment tools were used as comparators for the QATSM-RWS: the Newcastle-Ottawa Scale (NOS), and a Non-Summative Four-Point System.</p><p>Two researchers (TG &amp; CM), trained extensively in research design, methodology, epidemiology, healthcare research, statistics, systematic reviews, and meta-analysis, conducted the reliability ratings for each systematic review. A detailed list of scoring instructions was developed and provided to the raters. Throughout the rating process, the researchers were blinded to each other's assessments and prohibited from discussing their ratings. The ratings were based on whether the criteria/items in each quality assessment tool adequately measured their intended function. This rigorous approach aimed to ensure the reliability and validity of the quality assessments conducted in the study.</p><p>A weighted Cohen's kappa (<i>κ</i>) was calculated for each item of the quality assessment tools to evaluate interrater agreement between the two researchers. The two researchers were treated as fixed, where they evaluated all item of interests. The total number of “yes,” “no,” and “yes/no” responses that were common between the raters was used to assess overall agreement. Each item scored as “yes” received one point, and these points were summed to calculate a total agreement score. To assess the degree of consistency among the two researchers Intraclass Correlation Coefficients (ICC) were used to quantify the interrater agreement or reliability [<span>12</span>].</p><p>Agreement was interpreted using the criteria set by Landis and Koch, where a <i>κ</i>-value of less than 0 indicates less than chance agreement, 0.0 to 0.2 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates almost perfect or perfect agreement [<span>12</span>]. Overall, high interrater agreement indicates that the tool is easy to use and interpret consistently across different observers, while low agreement suggests that the tool or its items may require clarification or modification.</p><p>To compare the agreement graphically, the Bland–Altman limits of agreement method was employed [<span>13</span>]. The level of significance was set at 0.05, and all analyses were conducted using IBM SPSS version 29.0 (SPSS Inc., Armonk, NY). This comprehensive approach aimed to ensure robust and reliable assessments of interrater agreement for the quality assessment tools used in the study. The interobserver agreement of QATSM-RWS, NOS and nonsummative four-point system is presented in Table S2. The mean scores of agreements for QATSM-RWS, NOS and nonsummative four-point system were 0.781(95% CI: 0.328, 0.927), 0.759 (95% CI: 0.274, 0.919) and 0.588 (95% CI: 0.098, 0.856), respectively.</p><p>Table 1 assessed the interobserver agreement of the individual items in the QATSM-RWE. The highest and lowest mean kappa value was reported for the “description of key findings” and “description of inclusion and exclusion criteria” 0.77 (95% CI: 0.27, 0.99) and 0.44 (95% CI: 0.2, 0.99), respectively. The kappa value of all the items in the QATSM-RWS indicates that there was moderate to perfect agreement between the two observers. The items that showed moderate agreement include study sample description and definition; description of inclusion and exclusion criteria; description and appropriate choice of end point for the study and Inclusion of any funding sources that may affect the authors' interpretation of the results. Whereas the items with substantial and perfect agreement include: inclusion of research questions/objectives; inclusion of the scientific background and rationale for the investigation being reported; description of the data sources; description of study design and data analysis; inclusion of adequate sample size; description of appropriate follow-up period or last update to the major endpoints; description of sufficient methods to enable them to be repeated; description of key findings and inclusion of potential conflict of interest of study researcher(s) and funder(s). The only item reported with perfect agreement of the two raters was “justification of the discussions and conclusions by the key findings of the study.”</p><p>The interobserver ICCs for the total score was excellent for all instruments: QATSM-RWS, 0.87 (95% CI: 0.65, 0.97); NOS, 0.76 (95% CI: 0.54, 0.89); and the nonsummative four-point system, 0.72 (95% CI: 0.63, 0.91). Each instrument showed strong reliability, with ICCs values ranging from 0.72 to 0.87. These results emphasize the high level of agreement between observers for all scoring methods.</p><p>In relation to the QATSM-RWS total score, the mean difference between the two researchers’ scores was 0.00 (95% CI: -0.9466, 0.9466). The Bland and Altman's limits of agreement graph (Figure S1) indicates that there is no proportional bias between the two raters.</p><p>Real-world data is essential for improving evidence-based practice. This is the first study to evaluate the validity of the QATSM-RWS. In comparison to the Newcastle-Ottawa Scale (NOS) and the nonsummative four-point system, which are commonly employed in the literature, the QATSM-RWS demonstrates superior performance regarding agreement and reliability. These preliminary findings suggest that the QATSM-RWS tool may offer a more consistent and robust framework for assessing the quality of evidence in real-world studies.</p><p>The interrater reliability of the 14 items in the QATSM-RWS tool ranged from moderate to perfect agreement, suggesting that the instrument demonstrates a satisfactory degree of consistency across raters. This level of agreement aligns with established benchmarks for acceptable interrater reliability in health research tools [<span>14</span>] and supports the preliminary assertion that the items are clearly defined and interpretable.</p><p>The findings indicate that only minimal disagreements occurred between raters, suggesting that the QATSM-RWS tool exhibits a generally high level of interrater reliability. This consistency across users despite differences in background or experience reinforces the tool's potential for standardized application in assessing the methodological quality of systematic reviews and meta-analyses of real-world evidence (RWE) studies. Such reliability is critical for tools intended to inform evidence-based practice, as consistency in quality assessment directly influences the credibility of synthesized evidence [<span>15</span>]. Similar to well-established tools like AMSTAR, which has demonstrated robust psychometric properties and has been widely adopted in systematic review methodology, QATSM-RWS shows promise in fulfilling a comparable role in the emerging and complex field of RWE.</p><p>It is important to note that summary scores from quality assessment scales can sometimes mask the strengths or weaknesses of specific methodological components [<span>16</span>]. Additionally, certain elements of a quality assessment tool may hold greater significance than others depending on the context. Despite this, the authors assert that the QATSM-RWS tool is both valid and user-friendly for decision-makers and researchers engaged in systematic reviews and meta-analyses of real-world studies. Consequently, the overall score derived from the various domains of quality within QATSM-RWS remains meaningful and informative for evaluating the methodological rigor of included studies.</p><p>This study presents several strengths and limitations. One notable strength is the careful attention given to the wording in the development of the QATSM-RWS tool, which enhances its clarity and usability. However, it is important to recognize that judgments regarding the quality of included studies are inherently subjective. Providing more detailed descriptions of the assessment items could potentially improve the kappa values between the two observers. The inclusion of specific items in the QATSM-RWS tool, such as “description of data sources,” “conflict of interest,” and “funding source,” contributes to its comprehensiveness compared to the NOS and nonsummative four-point system. This is particularly relevant given evidence suggesting that funding sources can influence research outcomes and quality [<span>17</span>].</p><p>The QATSM-RWS tool shows promise as a potentially useful instrument for policymakers, HTA bodies, researchers, and clinicians involved in systematic reviews and meta-analyses of real-world evidence (RWE) studies. As this is the first study to evaluate the tool, the findings should be considered preliminary. Further research is needed to confirm its psychometric properties, including its validity and reliability across diverse contexts and user groups. Until such validation is completed, we recommend cautious, exploratory use of the QATSM-RWS tool, with ongoing evaluation to support its refinement and to determine its suitability for broader adoption in policy and practice.</p>","PeriodicalId":16090,"journal":{"name":"Journal of Evidence‐Based Medicine","volume":"18 2","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.70052","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Evidence‐Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70052","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Randomized controlled trials (RCTs) are considered the gold standard for assessing the efficacy of medical interventions [1]. However, real-world evidence (RWE) is increasingly recognized as essential for comprehensive healthcare decision-making. RCTs provide high internal validity and establish clear causal relationships due to their controlled environments and strict criteria. Nevertheless, the highly selective patient populations and controlled settings of RCTs can limit the external validity of their findings, making it challenging to generalize results to broader, more diverse populations [2]. RWE is derived from real-world data (RWD), such as electronic health records and insurance claims, and provides clinical insights into the usage, benefits, and risks of medical products. Unlike RCTs, RWE offers perspectives on treatment performance in everyday practice, which can significantly aid healthcare decision-making [3]. RWD serves to bridge the gap between clinical trials and real-world settings, informing guidelines, policy decisions, and new therapy approvals [4]. This type of evidence captures a wider range of patient populations and healthcare environments, making it particularly valuable for understanding the effectiveness, safety, and cost-effectiveness of interventions in real-world conditions.

Regulatory bodies and healthcare organizations increasingly rely on RWE to fill gaps left by RCTs [5]. For instance, the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have incorporated RWE to support regulatory decisions and postmarket surveillance [6]. When making healthcare recommendations, it is crucial that they are grounded in the best available research evidence [7]. Incorporating this evidence into healthcare practices can help reduce variations in healthcare delivery. The volume of research studies on healthcare is now enormous for healthcare professionals. RWE is instrumental in understanding the effectiveness and safety of interventions across diverse populations and in identifying rare adverse events and long-term outcomes, thus enhancing healthcare practices and policies [8].

To summarize and present the findings of individual research studies a structured approach is required. This structured approach, systematic review, provides a comprehensive and unbiased synthesis of many relevant studies in a single document. One of the most critical components of conducting a systematic review is the assessment of the quality of the included studies, as this significantly impacts the overall quality of evidence produced [9]. Quality appraisal refers to evaluating how well a study was designed and conducted looking at its methodological soundness, such as whether it used an appropriate study design, followed rigorous procedures, and addressed key elements like sample selection and data analysis [9]. In contrast, risk of bias assessment focuses specifically on identifying systematic errors that may distort the study's findings, such as selection bias, measurement bias, or confounding.

A recent scoping review highlighted a significant gap in the availability of methodological quality appraisal tools specifically designed for systematic reviews (SRs) and meta-analyses (MAs) involving real-world evidence (RWE) studies [10]. In the absence of such tailored instruments, researchers have commonly relied on general tools not originally developed with RWE in mind, such as the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies, the Critical Appraisal Skills Programme (CASP) Checklist, the Newcastle-Ottawa Scale (NOS), the Non-Summative Four-Point System, the Quality of Health Economic Studies Instrument, the STROBE Statement, and the Joanna Briggs Institute Critical Appraisal Tool for Prevalence Studies. While these tools offer useful frameworks for evaluating traditional observational studies, they may not adequately account for the unique methodological features and data heterogeneity characteristic of real-world studies. Unlike traditional observational research, which often relies on prospectively collected data from controlled research settings or cohorts, RWE studies draw on routinely collected data from clinical practice such as electronic health records, insurance claims, and patient registries that were not originally intended for research purposes, introducing complexities that existing appraisal tools may not fully address [11].

In response to this methodological gap, a novel instrument the Quality Assessment Tool for Systematic Reviews and Meta-Analyses Involving Real-World Studies (QATSM-RWS) has been developed [11]. QATSM-RWS is specifically designed to assess the methodological quality of SRs and MAs that synthesize data derived from real-world settings, such as electronic health records, insurance claims, patient registries, and other routinely collected healthcare data. Validating QATSM-RWS is a critical step to establish its reliability and relevance in assessing the quality of evidence generated from RWE. The present study aims to assess the interrater agreement of QATSM-RWS in comparison to existing quality assessment tools to ensure the consistency and reliability of assessments across different evaluators.

Fifteen SRs and meta-analyses on RWE studies were selected from a relevant database using a purposive sampling technique (Table S1). The selected studies focusing on musculoskeletal disease as a reference health condition were identified from a scoping review on quality assessment tools used in systematic reviews and meta-analyses of RWE studies by Gebrye and colleagues [10]. Two quality assessment tools were used as comparators for the QATSM-RWS: the Newcastle-Ottawa Scale (NOS), and a Non-Summative Four-Point System.

Two researchers (TG & CM), trained extensively in research design, methodology, epidemiology, healthcare research, statistics, systematic reviews, and meta-analysis, conducted the reliability ratings for each systematic review. A detailed list of scoring instructions was developed and provided to the raters. Throughout the rating process, the researchers were blinded to each other's assessments and prohibited from discussing their ratings. The ratings were based on whether the criteria/items in each quality assessment tool adequately measured their intended function. This rigorous approach aimed to ensure the reliability and validity of the quality assessments conducted in the study.

A weighted Cohen's kappa (κ) was calculated for each item of the quality assessment tools to evaluate interrater agreement between the two researchers. The two researchers were treated as fixed, where they evaluated all item of interests. The total number of “yes,” “no,” and “yes/no” responses that were common between the raters was used to assess overall agreement. Each item scored as “yes” received one point, and these points were summed to calculate a total agreement score. To assess the degree of consistency among the two researchers Intraclass Correlation Coefficients (ICC) were used to quantify the interrater agreement or reliability [12].

Agreement was interpreted using the criteria set by Landis and Koch, where a κ-value of less than 0 indicates less than chance agreement, 0.0 to 0.2 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates almost perfect or perfect agreement [12]. Overall, high interrater agreement indicates that the tool is easy to use and interpret consistently across different observers, while low agreement suggests that the tool or its items may require clarification or modification.

To compare the agreement graphically, the Bland–Altman limits of agreement method was employed [13]. The level of significance was set at 0.05, and all analyses were conducted using IBM SPSS version 29.0 (SPSS Inc., Armonk, NY). This comprehensive approach aimed to ensure robust and reliable assessments of interrater agreement for the quality assessment tools used in the study. The interobserver agreement of QATSM-RWS, NOS and nonsummative four-point system is presented in Table S2. The mean scores of agreements for QATSM-RWS, NOS and nonsummative four-point system were 0.781(95% CI: 0.328, 0.927), 0.759 (95% CI: 0.274, 0.919) and 0.588 (95% CI: 0.098, 0.856), respectively.

Table 1 assessed the interobserver agreement of the individual items in the QATSM-RWE. The highest and lowest mean kappa value was reported for the “description of key findings” and “description of inclusion and exclusion criteria” 0.77 (95% CI: 0.27, 0.99) and 0.44 (95% CI: 0.2, 0.99), respectively. The kappa value of all the items in the QATSM-RWS indicates that there was moderate to perfect agreement between the two observers. The items that showed moderate agreement include study sample description and definition; description of inclusion and exclusion criteria; description and appropriate choice of end point for the study and Inclusion of any funding sources that may affect the authors' interpretation of the results. Whereas the items with substantial and perfect agreement include: inclusion of research questions/objectives; inclusion of the scientific background and rationale for the investigation being reported; description of the data sources; description of study design and data analysis; inclusion of adequate sample size; description of appropriate follow-up period or last update to the major endpoints; description of sufficient methods to enable them to be repeated; description of key findings and inclusion of potential conflict of interest of study researcher(s) and funder(s). The only item reported with perfect agreement of the two raters was “justification of the discussions and conclusions by the key findings of the study.”

The interobserver ICCs for the total score was excellent for all instruments: QATSM-RWS, 0.87 (95% CI: 0.65, 0.97); NOS, 0.76 (95% CI: 0.54, 0.89); and the nonsummative four-point system, 0.72 (95% CI: 0.63, 0.91). Each instrument showed strong reliability, with ICCs values ranging from 0.72 to 0.87. These results emphasize the high level of agreement between observers for all scoring methods.

In relation to the QATSM-RWS total score, the mean difference between the two researchers’ scores was 0.00 (95% CI: -0.9466, 0.9466). The Bland and Altman's limits of agreement graph (Figure S1) indicates that there is no proportional bias between the two raters.

Real-world data is essential for improving evidence-based practice. This is the first study to evaluate the validity of the QATSM-RWS. In comparison to the Newcastle-Ottawa Scale (NOS) and the nonsummative four-point system, which are commonly employed in the literature, the QATSM-RWS demonstrates superior performance regarding agreement and reliability. These preliminary findings suggest that the QATSM-RWS tool may offer a more consistent and robust framework for assessing the quality of evidence in real-world studies.

The interrater reliability of the 14 items in the QATSM-RWS tool ranged from moderate to perfect agreement, suggesting that the instrument demonstrates a satisfactory degree of consistency across raters. This level of agreement aligns with established benchmarks for acceptable interrater reliability in health research tools [14] and supports the preliminary assertion that the items are clearly defined and interpretable.

The findings indicate that only minimal disagreements occurred between raters, suggesting that the QATSM-RWS tool exhibits a generally high level of interrater reliability. This consistency across users despite differences in background or experience reinforces the tool's potential for standardized application in assessing the methodological quality of systematic reviews and meta-analyses of real-world evidence (RWE) studies. Such reliability is critical for tools intended to inform evidence-based practice, as consistency in quality assessment directly influences the credibility of synthesized evidence [15]. Similar to well-established tools like AMSTAR, which has demonstrated robust psychometric properties and has been widely adopted in systematic review methodology, QATSM-RWS shows promise in fulfilling a comparable role in the emerging and complex field of RWE.

It is important to note that summary scores from quality assessment scales can sometimes mask the strengths or weaknesses of specific methodological components [16]. Additionally, certain elements of a quality assessment tool may hold greater significance than others depending on the context. Despite this, the authors assert that the QATSM-RWS tool is both valid and user-friendly for decision-makers and researchers engaged in systematic reviews and meta-analyses of real-world studies. Consequently, the overall score derived from the various domains of quality within QATSM-RWS remains meaningful and informative for evaluating the methodological rigor of included studies.

This study presents several strengths and limitations. One notable strength is the careful attention given to the wording in the development of the QATSM-RWS tool, which enhances its clarity and usability. However, it is important to recognize that judgments regarding the quality of included studies are inherently subjective. Providing more detailed descriptions of the assessment items could potentially improve the kappa values between the two observers. The inclusion of specific items in the QATSM-RWS tool, such as “description of data sources,” “conflict of interest,” and “funding source,” contributes to its comprehensiveness compared to the NOS and nonsummative four-point system. This is particularly relevant given evidence suggesting that funding sources can influence research outcomes and quality [17].

The QATSM-RWS tool shows promise as a potentially useful instrument for policymakers, HTA bodies, researchers, and clinicians involved in systematic reviews and meta-analyses of real-world evidence (RWE) studies. As this is the first study to evaluate the tool, the findings should be considered preliminary. Further research is needed to confirm its psychometric properties, including its validity and reliability across diverse contexts and user groups. Until such validation is completed, we recommend cautious, exploratory use of the QATSM-RWS tool, with ongoing evaluation to support its refinement and to determine its suitability for broader adoption in policy and practice.

对真实世界研究进行系统评价和meta分析的质量评估工具的验证
随机对照试验(rct)被认为是评估医疗干预效果的金标准。然而,现实世界证据(RWE)越来越被认为是全面医疗保健决策的必要条件。随机对照试验由于其受控环境和严格的标准,具有较高的内部效度和明确的因果关系。然而,随机对照试验的高选择性患者群体和受控环境限制了其研究结果的外部有效性,使其难以将结果推广到更广泛、更多样化的人群[10]。RWE源自真实世界数据(RWD),例如电子健康记录和保险索赔,并提供有关医疗产品的使用、益处和风险的临床见解。与随机对照试验不同,RWE提供了日常实践中治疗表现的观点,这可以显着帮助医疗保健决策bb0。RWD有助于弥合临床试验与现实环境之间的差距,为指导方针、政策决定和新疗法批准提供信息。这类证据涵盖了更广泛的患者群体和医疗保健环境,因此对于了解现实条件下干预措施的有效性、安全性和成本效益特别有价值。监管机构和医疗保健组织越来越依赖RWE来填补rct留下的空白。例如,美国食品和药物管理局(FDA)和欧洲药品管理局(EMA)已经将RWE纳入支持监管决策和上市后监督的行列。在提出医疗保健建议时,至关重要的是,这些建议必须以现有的最佳研究证据为基础。将这一证据纳入医疗保健实践有助于减少医疗保健服务的变化。对于医疗保健专业人员来说,关于医疗保健的研究数量现在是巨大的。RWE有助于了解针对不同人群的干预措施的有效性和安全性,并识别罕见的不良事件和长期结果,从而加强医疗保健实践和政策。为了总结和呈现个别研究的结果,需要一种结构化的方法。这种结构化的方法,系统综述,在一份文件中提供了许多相关研究的全面和公正的综合。进行系统评价的最关键组成部分之一是对纳入研究的质量进行评估,因为这将显著影响所产生证据的整体质量[10]。质量评估指的是评估一项研究的设计和实施情况,以及研究方法的合理性,比如是否使用了适当的研究设计,是否遵循了严格的程序,是否解决了样本选择和数据分析等关键因素。相比之下,偏倚风险评估特别侧重于识别可能扭曲研究结果的系统错误,如选择偏倚、测量偏倚或混淆。最近的一项范围审查强调了专门为涉及现实世界证据(RWE)研究的系统评价(SRs)和荟萃分析(MAs)设计的方法学质量评估工具的可用性存在重大差距[10]。在缺乏这种量身定制的工具的情况下,研究人员通常依赖于最初不是为RWE开发的一般工具,例如观察队列和横断面研究的质量评估工具,关键评估技能计划(CASP)检查表,纽卡斯尔-渥太华量表(NOS),非总结性四点系统,卫生经济研究质量工具,STROBE声明,以及乔安娜布里格斯研究所流行病学研究关键评估工具。虽然这些工具为评估传统观察性研究提供了有用的框架,但它们可能无法充分解释现实世界研究的独特方法特征和数据异质性特征。与传统的观察性研究不同,RWE研究通常依赖于从受控研究环境或队列中前瞻性收集的数据,而RWE研究利用了从临床实践中常规收集的数据,如电子健康记录、保险索赔和患者登记,这些数据最初并非用于研究目的,引入了现有评估工具可能无法完全解决[11]的复杂性。为了弥补这一方法学上的差距,一种新的工具——涉及现实世界研究的系统评价和荟萃分析质量评估工具(QATSM-RWS)已经被开发出来。QATSM-RWS专门用于评估sr和ma的方法学质量,这些sr和ma综合了来自真实环境的数据,如电子健康记录、保险索赔、患者登记和其他常规收集的医疗保健数据。 验证QATSM-RWS是在评估RWE产生的证据质量时建立其可靠性和相关性的关键步骤。本研究旨在评估QATSM-RWS与现有质量评估工具的相互一致性,以确保评估在不同评估者之间的一致性和可靠性。使用有目的抽样技术从相关数据库中选择了15个RWE研究的SRs和荟萃分析(表S1)。选定的以肌肉骨骼疾病为参考健康状况的研究是由Gebrye及其同事bbb对RWE研究的系统评价和荟萃分析中使用的质量评估工具进行的范围审查确定的。两种质量评估工具被用作QATSM-RWS的比较:纽卡斯尔-渥太华量表(NOS)和非总结性四点系统。两位研究者(TG &amp;CM)在研究设计、方法学、流行病学、医疗保健研究、统计学、系统评价和荟萃分析方面受过广泛培训,他对每个系统评价进行了可靠性评级。制定了一份详细的评分说明清单,并提供给评分员。在整个评分过程中,研究人员对彼此的评估不知情,也禁止讨论他们的评分。评级是基于每个质量评估工具中的标准/项目是否充分测量了它们的预期功能。这种严格的方法旨在确保在研究中进行的质量评估的可靠性和有效性。对质量评估工具的每个项目计算加权科恩kappa (κ),以评估两名研究人员之间的解释者一致性。这两名研究人员被视为固定的,他们评估所有感兴趣的项目。评分者之间常见的“是”、“否”和“是/否”回答的总数被用来评估总体一致性。每个被评为“是”的项目得到一分,这些分数被相加以计算总同意分数。为了评估两位研究者之间的一致性程度,我们使用类内相关系数(ICC)来量化研究者之间的一致性或可靠性[12]。使用Landis和Koch设定的标准来解释协议,其中κ值小于0表示不完全同意,0.0至0.2表示轻微同意,0.21至0.40表示公平同意,0.41至0.60表示中等同意,0.61至0.80表示基本同意,0.81至1.0表示几乎完全或完全同意[12]。总体而言,较高的解释器一致性表明该工具易于使用,并在不同的观察者之间一致地解释,而较低的一致性表明该工具或其项目可能需要澄清或修改。为了图形化地比较一致性,我们采用了一致性方法的Bland-Altman极限。显著性水平设为0.05,所有分析均使用IBM SPSS 29.0版(SPSS Inc., Armonk, NY)进行。这种全面的方法旨在确保对研究中使用的质量评估工具的译员间一致性进行稳健和可靠的评估。QATSM-RWS、NOS和非总合四分制的观察者间一致性见表S2。QATSM-RWS、NOS和非总合四分制的平均评分分别为0.781(95% CI: 0.328, 0.927)、0.759 (95% CI: 0.274, 0.919)和0.588 (95% CI: 0.098, 0.856)。表1评估了QATSM-RWE中各个项目的观察者间协议。“关键发现描述”和“纳入和排除标准描述”的最高和最低平均kappa值分别为0.77 (95% CI: 0.27, 0.99)和0.44 (95% CI: 0.2, 0.99)。QATSM-RWS中所有条目的kappa值表明,两个观察者之间存在中等到完全的一致。显示中等一致性的项目包括研究样本描述和定义;对纳入和排除标准的描述;对研究终点的描述和适当选择,并纳入可能影响作者对结果解释的任何资金来源。然而,实质性和完美一致的项目包括:包括研究问题/目标;包括所报告的调查的科学背景和理由;对数据来源的说明;研究设计说明和数据分析;纳入足够的样本量;描述适当的随访期或主要终点的最近更新;描述足够的方法,使其能够重复;描述主要发现,并包括研究人员和资助者的潜在利益冲突。两位评分员报告的唯一完全一致的项目是“研究的主要发现证明讨论和结论的合理性”。 所有仪器的总评分的观察者间ICCs都很好:QATSM-RWS, 0.87 (95% CI: 0.65, 0.97);no, 0.76 (95% ci: 0.54, 0.89);非总结性四分制,0.72 (95% CI: 0.63, 0.91)。各仪器均表现出较强的可靠性,ICCs值在0.72 ~ 0.87之间。这些结果强调了所有评分方法的观察者之间的高度一致。相对于QATSM-RWS总分,两位研究者得分的平均差值为0.00 (95% CI: -0.9466, 0.9466)。Bland和Altman的协议图极限(图S1)表明,两个评价者之间不存在比例偏差。真实世界的数据对于改进循证实践至关重要。这是第一个评估QATSM-RWS效度的研究。与文献中常用的纽卡斯尔-渥太华量表(NOS)和非总结性四分制相比,QATSM-RWS在一致性和可靠性方面表现出优越的性能。这些初步发现表明,QATSM-RWS工具可能为评估现实世界研究中的证据质量提供更一致和更可靠的框架。QATSM-RWS工具中14个项目的互评者信度范围从中等到完全一致,表明该工具在评价者之间表现出令人满意的一致性。这种程度的一致符合卫生研究工具中可接受的解释器可靠性的既定基准[b],并支持关于这些项目是明确定义和可解释的初步断言。研究结果表明,评分者之间只有很小的分歧,这表明QATSM-RWS工具在评分者之间表现出普遍较高的可靠性。尽管用户背景或经验存在差异,但这种一致性增强了该工具在评估系统评价和现实世界证据荟萃分析(RWE)研究的方法学质量方面的标准化应用潜力。这种可靠性对于旨在为循证实践提供信息的工具至关重要,因为质量评估的一致性直接影响综合证据的可信度。QATSM-RWS类似于AMSTAR等成熟的工具,已经证明了强大的心理测量特性,并被广泛应用于系统评价方法中,QATSM-RWS在新兴和复杂的RWE领域中发挥了类似的作用。值得注意的是,质量评估量表的总结性得分有时会掩盖特定方法学组成部分的优缺点。此外,质量评估工具的某些元素可能比其他元素更重要,这取决于上下文。尽管如此,作者断言,QATSM-RWS工具对于决策者和从事现实世界研究的系统评价和荟萃分析的研究人员来说既有效又用户友好。因此,从QATSM-RWS中各个质量领域得出的总体得分对于评估纳入研究的方法学严谨性仍然是有意义的和有用的。本研究提出了一些优势和局限性。一个值得注意的优点是在QATSM-RWS工具的开发过程中对措辞给予了仔细的关注,这增强了它的清晰度和可用性。然而,重要的是要认识到,对纳入研究的质量的判断本质上是主观的。对评估项目提供更详细的描述可能会改善两个观察者之间的kappa值。QATSM-RWS工具中包含的具体项目,如“数据源描述”、“利益冲突”和“资金来源”,与NOS和非总括式四点系统相比,有助于其全面性。鉴于有证据表明资金来源可以影响研究成果和质量,这一点尤为重要。QATSM-RWS工具有望成为决策者、HTA机构、研究人员和临床医生参与现实世界证据(RWE)研究的系统评价和荟萃分析的潜在有用工具。由于这是第一项评估该工具的研究,因此研究结果应被视为初步的。需要进一步的研究来证实其心理测量特性,包括其在不同背景和用户群体中的效度和信度。在验证完成之前,我们建议谨慎、探索性地使用QATSM-RWS工具,并进行持续评估,以支持其改进,并确定其在政策和实践中广泛采用的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Evidence‐Based Medicine
Journal of Evidence‐Based Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
11.20
自引率
1.40%
发文量
42
期刊介绍: The Journal of Evidence-Based Medicine (EMB) is an esteemed international healthcare and medical decision-making journal, dedicated to publishing groundbreaking research outcomes in evidence-based decision-making, research, practice, and education. Serving as the official English-language journal of the Cochrane China Centre and West China Hospital of Sichuan University, we eagerly welcome editorials, commentaries, and systematic reviews encompassing various topics such as clinical trials, policy, drug and patient safety, education, and knowledge translation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信