Christopher Baethge, Sandra Goldbeck-Wood, Stephan Mertens
{"title":"sanra -记叙性评论文章的质量评估量表。","authors":"Christopher Baethge, Sandra Goldbeck-Wood, Stephan Mertens","doi":"10.1186/s41073-019-0064-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.</p><p><strong>Methods: </strong>A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.</p><p><strong>Results: </strong>Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.</p><p><strong>Conclusions: </strong>SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the \"explanations and instructions\" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.</p>","PeriodicalId":74682,"journal":{"name":"Research integrity and peer review","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s41073-019-0064-8","citationCount":"581","resultStr":"{\"title\":\"SANRA-a scale for the quality assessment of narrative review articles.\",\"authors\":\"Christopher Baethge, Sandra Goldbeck-Wood, Stephan Mertens\",\"doi\":\"10.1186/s41073-019-0064-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.</p><p><strong>Methods: </strong>A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.</p><p><strong>Results: </strong>Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.</p><p><strong>Conclusions: </strong>SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the \\\"explanations and instructions\\\" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.</p>\",\"PeriodicalId\":74682,\"journal\":{\"name\":\"Research integrity and peer review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2019-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s41073-019-0064-8\",\"citationCount\":\"581\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research integrity and peer review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s41073-019-0064-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"ETHICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research integrity and peer review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41073-019-0064-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}
引用次数: 581
摘要
背景:叙述性综述是医学文献中最常见的文章类型。然而,与系统评价和随机对照试验(RCT)文章不同的是,这些文章有正式的工具来评估质量,目前还没有工具来评估叙述性评价的质量。针对这一差距,我们开发了SANRA,即叙述性评论文章评估量表。方法:由三名经验丰富的期刊编辑组成的团队根据面孔效度、项目总相关性和先前测试的信度分数修改或删除早期SANRA版本中的项目。我们删除了一个项目,该项目涉及手稿的写作和可访问性,因为评分者之间的可靠性差。修订后量表的六个项目从0(低标准)到2(高标准),涵盖以下主题:解释(1)综述的重要性和(2)目的,(3)文献检索和(4)引用和呈现(5)证据水平和(6)相关终点数据。对于所有项目,我们开发了锚定义和示例来指导用户填写表单。修订后的量表由相同的编辑(对彼此的评分不知情)在一组30个连续提交给普通医学杂志的非系统评论手稿中进行测试。结果:评分者确认完成该量表在日常编辑工作中是可行的。所有30篇稿件的平均总得分为6.0分(SD 2.6,范围1-12)。修正后的项目-总量相关性从0.33(项目3)到0.58(项目6)不等,Cronbach's alpha为0.68(内部一致性)。组内相关系数(平均测量值)为0.77 [95% CI 0.57, 0.88](组间信度)。评分者通常对第1项和第4项持不同意见。结论:SANRA量表的可行性、量表间信度、量表的同质性和量表内部的一致性足以编制一个六项的量表。需要进一步的实地测试,特别是有效性测试。我们建议根据SANRA提供的“说明和说明”文件进行评级培训。在编辑决策中,SANRA可以补充期刊对稿件的特定评估——例如,与读者、原创性或难度有关——并可能有助于提高非系统评论的标准。
SANRA-a scale for the quality assessment of narrative review articles.
Background: Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.
Methods: A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.
Results: Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.
Conclusions: SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the "explanations and instructions" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.