{"title":"Large language models for scholarly ontology generation: An extensive analysis in the engineering field","authors":"Tanay Aggarwal , Angelo Salatino , Francesco Osborne , Enrico Motta","doi":"10.1016/j.ipm.2025.104262","DOIUrl":null,"url":null,"abstract":"<div><div>Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. One of the key challenges in this domain is accurately assessing the semantic relationships between pairs of research topics. This paper presents an analysis of the capabilities of large language models (LLMs) in identifying such relationships, with a specific focus on the field of engineering. To this end, we introduce a novel benchmark based on the IEEE Thesaurus for evaluating the task of identifying three types of semantic relations between pairs of topics: <em>broader</em>, <em>narrower</em>, and <em>same-as</em>. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models with varying architectures and sizes have achieved excellent results on this task, including Mixtral-8<span><math><mo>×</mo></math></span> 7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can achieve strong performance while requiring very limited computational resources.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104262"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002031","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. One of the key challenges in this domain is accurately assessing the semantic relationships between pairs of research topics. This paper presents an analysis of the capabilities of large language models (LLMs) in identifying such relationships, with a specific focus on the field of engineering. To this end, we introduce a novel benchmark based on the IEEE Thesaurus for evaluating the task of identifying three types of semantic relations between pairs of topics: broader, narrower, and same-as. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models with varying architectures and sizes have achieved excellent results on this task, including Mixtral-8 7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can achieve strong performance while requiring very limited computational resources.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.