基于知识重组与传播的科学新颖性度量混合图与LLM方法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-09-24 DOI:10.1016/j.eswa.2025.129794

Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen

{"title":"基于知识重组与传播的科学新颖性度量混合图与LLM方法","authors":"Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen","doi":"10.1016/j.eswa.2025.129794","DOIUrl":null,"url":null,"abstract":"<div><div>Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: <span><span>https://github.com/haihua0913/graphLLM4ScientificNovelty</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129794"},"PeriodicalIF":7.5000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation\",\"authors\":\"Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen\",\"doi\":\"10.1016/j.eswa.2025.129794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: <span><span>https://github.com/haihua0913/graphLLM4ScientificNovelty</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"298 \",\"pages\":\"Article 129794\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425034098\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034098","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

科学新颖性是学科创新和跨学科进步的根本催化剂。然而，新颖性评估的主流方法主要强调单一的分析维度——要么是焦点论文的语义内容，要么是被引用的参考文献。基于内容的方法经常不能纳入目标出版物引用的基础知识，而基于参考的策略往往忽视焦点工作本身的内在概念贡献。为了解决这一限制，本研究引入了一种混合图和大型语言模型方法来共同捕获和整合焦点论文及其引用文献中嵌入的知识。该方法将知识重组与传播相结合，分为四个主要阶段。首先，应用基于提示的提取技术提取知识。其次，构建参考知识组合网络（RKCN），对焦点论文所引用的知识进行建模。第三，使用SciDeBERTa（CS）生成的表示对RKCN进行初始化，并使用图关注网络在网络中传播知识。最后，通过基于传播表示的所有内部知识组合的新颖性得分，对焦点论文的新颖性进行量化。人工智能（AI）领域的实验评估表明，该方法在量化科学新颖性方面明显优于现有的基线方法。额外的消融研究进一步验证了知识传播模块的贡献。一个案例研究说明了我们的方法的可解释性，并且在生物医学工程（BME）领域的跨领域验证突出了它的鲁棒性和跨领域的泛化性。对获奖论文和非获奖论文的多维度比较分析进一步揭示，获奖论文通常包含更大的知识量，在知识组合方面表现出更大的多样性。此外，虽然两组都包含了跨越广泛新颖性的知识组合，但获奖论文在新颖性水平较高的地方表现出更强的集中度，而非获奖论文的分布则更为均匀。数据、代码和更详细的结果可在：https://github.com/haihua0913/graphLLM4ScientificNovelty上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation

Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: https://github.com/haihua0913/graphLLM4ScientificNovelty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.