Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen
{"title":"基于知识重组与传播的科学新颖性度量混合图与LLM方法","authors":"Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen","doi":"10.1016/j.eswa.2025.129794","DOIUrl":null,"url":null,"abstract":"<div><div>Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: <span><span>https://github.com/haihua0913/graphLLM4ScientificNovelty</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129794"},"PeriodicalIF":7.5000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation\",\"authors\":\"Zhongyi Wang , Zeren Wang , Guangzhao Zhang , Jiangping Chen , Markus Luczak-Roesch , Haihua Chen\",\"doi\":\"10.1016/j.eswa.2025.129794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: <span><span>https://github.com/haihua0913/graphLLM4ScientificNovelty</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"298 \",\"pages\":\"Article 129794\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425034098\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034098","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation
Scientific novelty constitutes a fundamental catalyst for both disciplinary innovation and interdisciplinary progress. Nevertheless, prevailing approaches to novelty assessment predominantly emphasize a single analytical dimension–either the semantic content of the focal paper or its cited references. Content-based methodologies frequently fail to incorporate the foundational knowledge cited by the target publication, whereas reference-based strategies tend to disregard the intrinsic conceptual contributions of the focal work itself. To address this limitation, the present study introduces a hybrid graph and large language model approach to jointly capture and integrate knowledge embedded in both the focal paper and its cited literature. The proposed method, which integrates knowledge recombination and propagation, is structured into four primary stages. First, prompt-based extraction techniques using general LLMs are applied to extract knowledge. Second, a Reference Knowledge Combination Network (RKCN) is constructed to model the knowledge referenced by the focal paper. Third, the RKCN is initialized with representations generated by SciDeBERTa(CS), and a graph attention network is employed to propagate knowledge across the network. Finally, the novelty of the focal paper is quantified by aggregating the novelty scores of all internal knowledge combinations based on the propagated representations. Experimental evaluation in the domain of artificial intelligence (AI) demonstrates that the proposed method significantly outperforms existing baseline approaches in quantifying scientific novelty. Additional ablation studies further validate the contribution of the knowledge propagation module. A case study illustrates the interpretability of our approach, and a cross-field validation in Biomedical Engineering (BME) domain highlights its robustness and cross-domain generalizability. A multi-dimensional comparative analysis between award-winning and non-award papers further reveals that the former generally incorporate a larger volume of knowledge and exhibit greater diversity in knowledge combinations. Moreover, while both groups encompass knowledge combinations spanning a wide range of novelty, award-winning papers display a stronger concentration at higher novelty levels, in contrast to the more uniform distribution observed in non-award papers. Data, code, and more detailed results are publicly available at: https://github.com/haihua0913/graphLLM4ScientificNovelty.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.