Gollam Rabby, Sören Auer, Jennifer D'Souza, Allard Oelen
{"title":"Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization","authors":"Gollam Rabby, Sören Auer, Jennifer D'Souza, Allard Oelen","doi":"arxiv-2409.06433","DOIUrl":null,"url":null,"abstract":"The increasing amount of published scholarly articles, exceeding 2.5 million\nyearly, raises the challenge for researchers in following scientific progress.\nIntegrating the contributions from scholarly articles into a novel type of\ncognitive knowledge graph (CKG) will be a crucial element for accessing and\norganizing scholarly knowledge, surpassing the insights provided by titles and\nabstracts. This research focuses on effectively conveying structured scholarly\nknowledge by utilizing large language models (LLMs) to categorize scholarly\narticles and describe their contributions in a structured and comparable\nmanner. While previous studies explored language models within specific\nresearch domains, the extensive domain-independent knowledge captured by LLMs\noffers a substantial opportunity for generating structured contribution\ndescriptions as CKGs. Additionally, LLMs offer customizable pathways through\nprompt engineering or fine-tuning, thus facilitating to leveraging of smaller\nLLMs known for their efficiency, cost-effectiveness, and environmental\nconsiderations. Our methodology involves harnessing LLM knowledge, and\ncomplementing it with domain expert-verified scholarly data sourced from a CKG.\nThis strategic fusion significantly enhances LLM performance, especially in\ntasks like scholarly article categorization and predicate recommendation. Our\nmethod involves fine-tuning LLMs with CKG knowledge and additionally injecting\nknowledge from a CKG with a novel prompting technique significantly increasing\nthe accuracy of scholarly knowledge extraction. We integrated our approach in\nthe Open Research Knowledge Graph (ORKG), thus enabling precise access to\norganized scholarly knowledge, crucially benefiting domain-independent\nscholarly knowledge exchange and dissemination among policymakers, industrial\npractitioners, and the general public.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing amount of published scholarly articles, exceeding 2.5 million
yearly, raises the challenge for researchers in following scientific progress.
Integrating the contributions from scholarly articles into a novel type of
cognitive knowledge graph (CKG) will be a crucial element for accessing and
organizing scholarly knowledge, surpassing the insights provided by titles and
abstracts. This research focuses on effectively conveying structured scholarly
knowledge by utilizing large language models (LLMs) to categorize scholarly
articles and describe their contributions in a structured and comparable
manner. While previous studies explored language models within specific
research domains, the extensive domain-independent knowledge captured by LLMs
offers a substantial opportunity for generating structured contribution
descriptions as CKGs. Additionally, LLMs offer customizable pathways through
prompt engineering or fine-tuning, thus facilitating to leveraging of smaller
LLMs known for their efficiency, cost-effectiveness, and environmental
considerations. Our methodology involves harnessing LLM knowledge, and
complementing it with domain expert-verified scholarly data sourced from a CKG.
This strategic fusion significantly enhances LLM performance, especially in
tasks like scholarly article categorization and predicate recommendation. Our
method involves fine-tuning LLMs with CKG knowledge and additionally injecting
knowledge from a CKG with a novel prompting technique significantly increasing
the accuracy of scholarly knowledge extraction. We integrated our approach in
the Open Research Knowledge Graph (ORKG), thus enabling precise access to
organized scholarly knowledge, crucially benefiting domain-independent
scholarly knowledge exchange and dissemination among policymakers, industrial
practitioners, and the general public.