Christopher Baethge, Sandra Goldbeck-Wood, Stephan Mertens
{"title":"SANRA-a scale for the quality assessment of narrative review articles.","authors":"Christopher Baethge, Sandra Goldbeck-Wood, Stephan Mertens","doi":"10.1186/s41073-019-0064-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.</p><p><strong>Methods: </strong>A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.</p><p><strong>Results: </strong>Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.</p><p><strong>Conclusions: </strong>SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the \"explanations and instructions\" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.</p>","PeriodicalId":74682,"journal":{"name":"Research integrity and peer review","volume":"4 ","pages":"5"},"PeriodicalIF":7.2000,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s41073-019-0064-8","citationCount":"581","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research integrity and peer review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41073-019-0064-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}
引用次数: 581
Abstract
Background: Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.
Methods: A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.
Results: Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.
Conclusions: SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the "explanations and instructions" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.