Predicting article quality scores with machine learning: The U.K. Research Excellence Framework

IF 3.5 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE

Quantitative Science Studies Pub Date : 2022-12-11 DOI:10.1162/qss_a_00258

M. Thelwall, K. Kousha, Mahshid Abdoli, E. Stuart, Meiko Makita, Paul Wilson, Jonathan M. Levitt, Petr Knoth, M. Cancellieri

{"title":"Predicting article quality scores with machine learning: The U.K. Research Excellence Framework","authors":"M. Thelwall, K. Kousha, Mahshid Abdoli, E. Stuart, Meiko Makita, Paul Wilson, Jonathan M. Levitt, Petr Knoth, M. Cancellieri","doi":"10.1162/qss_a_00258","DOIUrl":null,"url":null,"abstract":"Abstract National research evaluation initiatives and incentive schemes choose between simplistic quantitative indicators and time-consuming peer/expert review, sometimes supported by bibliometrics. Here we assess whether machine learning could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provisional three-level REF2021 peer review scores for 84,966 articles submitted to the U.K. Research Excellence Framework 2021, matching a Scopus record 2014–18 and with a substantial abstract. We found that accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics, reaching 42% above the baseline (72% overall) in the best case. This is based on 1,000 bibliometric inputs and half of the articles used for training in each UoA. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, and humanities UoAs were much lower or close to zero. The Random Forest Classifier (standard or ordinal) and Extreme Gradient Boosting Classifier algorithms performed best from the 32 tested. Accuracy was lower if UoAs were merged or replaced by Scopus broad categories. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, but this substantially reduced the number of scores predicted.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"4 1","pages":"547-573"},"PeriodicalIF":3.5000,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Science Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/qss_a_00258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 4

Abstract

Abstract National research evaluation initiatives and incentive schemes choose between simplistic quantitative indicators and time-consuming peer/expert review, sometimes supported by bibliometrics. Here we assess whether machine learning could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provisional three-level REF2021 peer review scores for 84,966 articles submitted to the U.K. Research Excellence Framework 2021, matching a Scopus record 2014–18 and with a substantial abstract. We found that accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics, reaching 42% above the baseline (72% overall) in the best case. This is based on 1,000 bibliometric inputs and half of the articles used for training in each UoA. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, and humanities UoAs were much lower or close to zero. The Random Forest Classifier (standard or ordinal) and Extreme Gradient Boosting Classifier algorithms performed best from the 32 tested. Accuracy was lower if UoAs were merged or replaced by Scopus broad categories. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, but this substantially reduced the number of scores predicted.

查看原文本刊更多论文

用机器学习预测文章质量分数:英国卓越研究框架

摘要国家研究评估举措和激励方案在简单的量化指标和耗时的同行/专家评审之间做出选择，有时还得到文献计量学的支持。在这里，我们评估了机器学习是否可以提供第三种选择，即使用更多的文献计量和元数据输入来估计文章质量。我们对提交给英国卓越研究框架2021的84966篇文章进行了临时三级REF2021同行评审得分调查，与2014-2018年Scopus记录和一篇实质性摘要相匹配。我们发现，医学和物理科学评估单位（UoAs）和经济学的准确率最高，在最佳情况下比基线高出42%（总体高出72%）。这是基于1000个文献计量输入和每个UoA用于培训的一半文章。社会科学、数学、工程、艺术和人文学科UoA高于基线的预测准确率要低得多或接近于零。随机森林分类器（标准或有序）和极限梯度提升分类器算法在32个测试中表现最好。如果UoAs被Scopus大类合并或取代，则准确性较低。我们通过主动学习策略和选择预测概率较高的文章来提高准确性，但这大大减少了预测分数的数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊