论文评估：一种通用的、定量的、可解释的论文评估方法，由多代理系统提供动力

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-05-30 DOI:10.1016/j.ipm.2025.104225

Shengzhi Huang , Qicong Wang , Wei Lu , Lingyu Liu , Zhenzhen Xu , Yong Huang

{"title":"论文评估：一种通用的、定量的、可解释的论文评估方法，由多代理系统提供动力","authors":"Shengzhi Huang , Qicong Wang , Wei Lu , Lingyu Liu , Zhenzhen Xu , Yong Huang","doi":"10.1016/j.ipm.2025.104225","DOIUrl":null,"url":null,"abstract":"<div><div>The immediate and efficient evaluation of scientific papers is crucial for advancing scientific progress. However, traditional peer review faces numerous challenges, including reviewer bias, limited expertise, and an overwhelming volume of publications. Recent advancements in large language models (LLMs) suggest their potential as promising evaluators, capable of approximating human cognition and understanding both ordinary and scientific language. In this study, we propose a novel AI-empowered paper evaluation method, PaperEval (PE), which utilizes a multi-agent system powered by LLMs to design evaluation criteria, assess paper quality along different dimensions, and generate explainable scores. We also introduce two variants of PE, Multi-round PaperEval (MPE) and Self-correcting PaperEval (SPE), which produce comparable scores and iteratively refine the evaluation criteria, respectively. To test our methods, we conducted a comprehensive analysis of three curated datasets, encompassing about 66,000 target papers of varying quality across the fields of mathematics, physics, chemistry, and medicine. The results show that our methods can effectively discern between high- and low-quality papers based on scores derived in four dimensions: Question, Method, Result, and Conclusion. Moreover, the results highlight the evaluation’s stability over time, the impact of comparative papers, the advantages of the multi-round evaluation strategy, and the varying correlation between AI ratings and scientific impact across different disciplines. Our method can seamlessly integrate into the existing scientific evaluation system, offering valuable insights for the development of AI-driven scientific evaluation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104225"},"PeriodicalIF":7.4000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PaperEval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system\",\"authors\":\"Shengzhi Huang , Qicong Wang , Wei Lu , Lingyu Liu , Zhenzhen Xu , Yong Huang\",\"doi\":\"10.1016/j.ipm.2025.104225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The immediate and efficient evaluation of scientific papers is crucial for advancing scientific progress. However, traditional peer review faces numerous challenges, including reviewer bias, limited expertise, and an overwhelming volume of publications. Recent advancements in large language models (LLMs) suggest their potential as promising evaluators, capable of approximating human cognition and understanding both ordinary and scientific language. In this study, we propose a novel AI-empowered paper evaluation method, PaperEval (PE), which utilizes a multi-agent system powered by LLMs to design evaluation criteria, assess paper quality along different dimensions, and generate explainable scores. We also introduce two variants of PE, Multi-round PaperEval (MPE) and Self-correcting PaperEval (SPE), which produce comparable scores and iteratively refine the evaluation criteria, respectively. To test our methods, we conducted a comprehensive analysis of three curated datasets, encompassing about 66,000 target papers of varying quality across the fields of mathematics, physics, chemistry, and medicine. The results show that our methods can effectively discern between high- and low-quality papers based on scores derived in four dimensions: Question, Method, Result, and Conclusion. Moreover, the results highlight the evaluation’s stability over time, the impact of comparative papers, the advantages of the multi-round evaluation strategy, and the varying correlation between AI ratings and scientific impact across different disciplines. Our method can seamlessly integrate into the existing scientific evaluation system, offering valuable insights for the development of AI-driven scientific evaluation.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 6\",\"pages\":\"Article 104225\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325001669\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325001669","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

科学论文的即时和有效的评价是推动科学进步的关键。然而，传统的同行评议面临着许多挑战，包括审稿人偏见、有限的专业知识和压倒性的出版物量。大型语言模型（llm）的最新进展表明，它们有潜力成为有前途的评估器，能够近似人类认知并理解普通和科学语言。在本研究中，我们提出了一种新的基于人工智能的论文评估方法PaperEval (PE)，该方法利用由法学硕士支持的多智能体系统来设计评估标准，从不同的维度评估论文质量，并生成可解释的分数。我们还介绍了PE的两种变体，Multi-round PaperEval （MPE）和Self-correcting PaperEval (SPE)，它们分别产生可比较的分数并迭代改进评估标准。为了测试我们的方法，我们对三个精心策划的数据集进行了全面分析，其中包括数学、物理、化学和医学领域约66,000篇不同质量的目标论文。结果表明，我们的方法可以有效地根据问题、方法、结果和结论四个维度的得分来区分高质量和低质量的论文。此外，研究结果强调了评估随时间的稳定性、比较论文的影响、多轮评估策略的优势，以及不同学科之间人工智能评级与科学影响之间的不同相关性。我们的方法可以无缝集成到现有的科学评估体系中，为人工智能驱动的科学评估的发展提供有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PaperEval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system

The immediate and efficient evaluation of scientific papers is crucial for advancing scientific progress. However, traditional peer review faces numerous challenges, including reviewer bias, limited expertise, and an overwhelming volume of publications. Recent advancements in large language models (LLMs) suggest their potential as promising evaluators, capable of approximating human cognition and understanding both ordinary and scientific language. In this study, we propose a novel AI-empowered paper evaluation method, PaperEval (PE), which utilizes a multi-agent system powered by LLMs to design evaluation criteria, assess paper quality along different dimensions, and generate explainable scores. We also introduce two variants of PE, Multi-round PaperEval (MPE) and Self-correcting PaperEval (SPE), which produce comparable scores and iteratively refine the evaluation criteria, respectively. To test our methods, we conducted a comprehensive analysis of three curated datasets, encompassing about 66,000 target papers of varying quality across the fields of mathematics, physics, chemistry, and medicine. The results show that our methods can effectively discern between high- and low-quality papers based on scores derived in four dimensions: Question, Method, Result, and Conclusion. Moreover, the results highlight the evaluation’s stability over time, the impact of comparative papers, the advantages of the multi-round evaluation strategy, and the varying correlation between AI ratings and scientific impact across different disciplines. Our method can seamlessly integrate into the existing scientific evaluation system, offering valuable insights for the development of AI-driven scientific evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.