Can journal reviewers dependably assess rigour, significance, and originality in theoretical papers? Evidence from physics

IF 2.5 4区管理学 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE

Research Evaluation Pub Date : 2023-06-30 DOI:10.1093/reseval/rvad018

M. Thelwall, J. Hołyst

{"title":"Can journal reviewers dependably assess rigour, significance, and originality in theoretical papers? Evidence from physics","authors":"M. Thelwall, J. Hołyst","doi":"10.1093/reseval/rvad018","DOIUrl":null,"url":null,"abstract":"\n Peer review is a key gatekeeper for academic journals, attempting to block inadequate submissions or correcting them to a publishable standard, as well as improving those that are already satisfactory. The three key aspects of research quality are rigour, significance, and originality but no prior study has assessed whether journal reviewers are ever able to judge these effectively. In response, this article compares reviewer scores for these aspects for theoretical articles in the SciPost Physics journal. It also compares them with Italian research assessment exercise physics reviewer agreement scores. SciPost Physics theoretical articles give a nearly ideal case: a theoretical aspect of a mature science, for which suitable reviewers might comprehend the entire paper. Nevertheless, intraclass correlations between the first two reviewers for the three core quality scores were similar and moderate, 0.36 (originality), 0.39 (significance), and 0.40 (rigour), so there is no aspect that different reviewers are consistent about. Differences tended to be small, with 86% of scores agreeing or differing by 1 on a 6-point scale. Individual reviewers were most likely to give similar scores for significance and originality (Spearman 0.63), and least likely to for originality and validity (Spearman 0.38). Whilst a lack of norm referencing is probably the biggest reason for differences between reviewers, others include differing background knowledge, understanding, and beliefs about valid assumptions. The moderate agreement between reviewers on the core aspects of scientific quality, including rigour, in a nearly ideal case is concerning for the security of the wider academic record.","PeriodicalId":47668,"journal":{"name":"Research Evaluation","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Evaluation","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/reseval/rvad018","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Peer review is a key gatekeeper for academic journals, attempting to block inadequate submissions or correcting them to a publishable standard, as well as improving those that are already satisfactory. The three key aspects of research quality are rigour, significance, and originality but no prior study has assessed whether journal reviewers are ever able to judge these effectively. In response, this article compares reviewer scores for these aspects for theoretical articles in the SciPost Physics journal. It also compares them with Italian research assessment exercise physics reviewer agreement scores. SciPost Physics theoretical articles give a nearly ideal case: a theoretical aspect of a mature science, for which suitable reviewers might comprehend the entire paper. Nevertheless, intraclass correlations between the first two reviewers for the three core quality scores were similar and moderate, 0.36 (originality), 0.39 (significance), and 0.40 (rigour), so there is no aspect that different reviewers are consistent about. Differences tended to be small, with 86% of scores agreeing or differing by 1 on a 6-point scale. Individual reviewers were most likely to give similar scores for significance and originality (Spearman 0.63), and least likely to for originality and validity (Spearman 0.38). Whilst a lack of norm referencing is probably the biggest reason for differences between reviewers, others include differing background knowledge, understanding, and beliefs about valid assumptions. The moderate agreement between reviewers on the core aspects of scientific quality, including rigour, in a nearly ideal case is concerning for the security of the wider academic record.

查看原文本刊更多论文

期刊审稿人能否可靠地评估理论论文的严谨性、重要性和原创性?来自物理学的证据

同行评审是学术期刊的关键看门人，试图阻止不充分的投稿或将其更正为可发表的标准，并改进那些已经令人满意的投稿。研究质量的三个关键方面是严谨性、重要性和独创性，但之前没有任何研究评估期刊评审员是否能够有效地判断这些方面。作为回应，本文比较了《科学邮报物理》杂志上理论文章的审稿人在这些方面的得分。它还将它们与意大利研究评估演习物理评审员的一致性分数进行了比较。SciPost物理学的理论文章给出了一个近乎理想的案例：一门成熟科学的理论方面，合适的审稿人可能会理解整篇论文。然而，前两位评审员对三个核心质量分数的组内相关性相似且中等，分别为0.36（原创性）、0.39（显著性）和0.40（严谨性），因此不同评审员在这方面没有一致性。差异往往很小，在6分制中，86%的分数一致或相差1分。个体评论者在显著性和独创性方面最有可能给出相似的分数（Spearman 0.63），在独创性和有效性方面最不可能给出类似的分数（斯皮尔曼0.38）。虽然缺乏规范参考可能是评论者之间存在差异的最大原因，但其他因素包括背景知识、理解和对有效假设的信念不同。在几乎理想的情况下，评审员之间在科学质量的核心方面（包括严谨性）达成了适度的一致，这关系到更广泛学术记录的安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Evaluation INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

6.00

自引率

18.20%

发文量

期刊介绍： Research Evaluation is a peer-reviewed, international journal. It ranges from the individual research project up to inter-country comparisons of research performance. Research projects, researchers, research centres, and the types of research output are all relevant. It includes public and private sectors, natural and social sciences. The term "evaluation" applies to all stages from priorities and proposals, through the monitoring of on-going projects and programmes, to the use of the results of research.