Shengzhi Huang , Wei Lu , Zhenzhen Xu , Qikai Cheng , Jinqing Yang , Yong Huang
{"title":"通过基于比较权力的大型模型识别潜在的破坏性研究","authors":"Shengzhi Huang , Wei Lu , Zhenzhen Xu , Qikai Cheng , Jinqing Yang , Yong Huang","doi":"10.1016/j.ipm.2025.104207","DOIUrl":null,"url":null,"abstract":"<div><div>Timely identification of potentially disruptive research is a significant research issue, since disruptive innovation in science transforms the existing paradigm and/or opens a new paradigm. This study proposes a comparative power-based large model that can promptly and accurately identify potentially disruptive research via comparative analysis of semantically-related papers. To this end, a self-constructed dataset was built by treating accumulated disruptive and consolidating citations as crowdsourced annotation data. We employed a range of machine learning models (MLs), deep learning models (DLs), and large language models (LLMs) to build classifiers. Our optimal model, Mistral-7B<sup>+*</sup>, attains an impressive F1 score of 0.8210 and outperforms the best-performing ML and DL models by approximately 27.05 % and 14.03 %, respectively. Testing on 275 recently published biomedical papers further verifies its effectiveness. Additionally, we conduct comprehensive experiments to scrutinize the comparative power of the large model as well as the impact of the number and quality of comparative papers and distinct functional paragraphs within abstracts on identification performance. Our findings show that an appropriate number and quality of comparative papers can promote identification performance. Moreover, result-based paragraphs are the most important for identifying disruptive research, while method-based paragraphs are least important.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104207"},"PeriodicalIF":6.9000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying potentially disruptive research via a comparative power-based large model\",\"authors\":\"Shengzhi Huang , Wei Lu , Zhenzhen Xu , Qikai Cheng , Jinqing Yang , Yong Huang\",\"doi\":\"10.1016/j.ipm.2025.104207\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Timely identification of potentially disruptive research is a significant research issue, since disruptive innovation in science transforms the existing paradigm and/or opens a new paradigm. This study proposes a comparative power-based large model that can promptly and accurately identify potentially disruptive research via comparative analysis of semantically-related papers. To this end, a self-constructed dataset was built by treating accumulated disruptive and consolidating citations as crowdsourced annotation data. We employed a range of machine learning models (MLs), deep learning models (DLs), and large language models (LLMs) to build classifiers. Our optimal model, Mistral-7B<sup>+*</sup>, attains an impressive F1 score of 0.8210 and outperforms the best-performing ML and DL models by approximately 27.05 % and 14.03 %, respectively. Testing on 275 recently published biomedical papers further verifies its effectiveness. Additionally, we conduct comprehensive experiments to scrutinize the comparative power of the large model as well as the impact of the number and quality of comparative papers and distinct functional paragraphs within abstracts on identification performance. Our findings show that an appropriate number and quality of comparative papers can promote identification performance. Moreover, result-based paragraphs are the most important for identifying disruptive research, while method-based paragraphs are least important.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 6\",\"pages\":\"Article 104207\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325001487\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325001487","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Identifying potentially disruptive research via a comparative power-based large model
Timely identification of potentially disruptive research is a significant research issue, since disruptive innovation in science transforms the existing paradigm and/or opens a new paradigm. This study proposes a comparative power-based large model that can promptly and accurately identify potentially disruptive research via comparative analysis of semantically-related papers. To this end, a self-constructed dataset was built by treating accumulated disruptive and consolidating citations as crowdsourced annotation data. We employed a range of machine learning models (MLs), deep learning models (DLs), and large language models (LLMs) to build classifiers. Our optimal model, Mistral-7B+*, attains an impressive F1 score of 0.8210 and outperforms the best-performing ML and DL models by approximately 27.05 % and 14.03 %, respectively. Testing on 275 recently published biomedical papers further verifies its effectiveness. Additionally, we conduct comprehensive experiments to scrutinize the comparative power of the large model as well as the impact of the number and quality of comparative papers and distinct functional paragraphs within abstracts on identification performance. Our findings show that an appropriate number and quality of comparative papers can promote identification performance. Moreover, result-based paragraphs are the most important for identifying disruptive research, while method-based paragraphs are least important.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.