Large language models for scholarly ontology generation: An extensive analysis in the engineering field

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-07-21 DOI:10.1016/j.ipm.2025.104262

Tanay Aggarwal , Angelo Salatino , Francesco Osborne , Enrico Motta

{"title":"Large language models for scholarly ontology generation: An extensive analysis in the engineering field","authors":"Tanay Aggarwal , Angelo Salatino , Francesco Osborne , Enrico Motta","doi":"10.1016/j.ipm.2025.104262","DOIUrl":null,"url":null,"abstract":"<div><div>Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. One of the key challenges in this domain is accurately assessing the semantic relationships between pairs of research topics. This paper presents an analysis of the capabilities of large language models (LLMs) in identifying such relationships, with a specific focus on the field of engineering. To this end, we introduce a novel benchmark based on the IEEE Thesaurus for evaluating the task of identifying three types of semantic relations between pairs of topics: <em>broader</em>, <em>narrower</em>, and <em>same-as</em>. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models with varying architectures and sizes have achieved excellent results on this task, including Mixtral-8<span><math><mo>×</mo></math></span> 7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can achieve strong performance while requiring very limited computational resources.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104262"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002031","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. One of the key challenges in this domain is accurately assessing the semantic relationships between pairs of research topics. This paper presents an analysis of the capabilities of large language models (LLMs) in identifying such relationships, with a specific focus on the field of engineering. To this end, we introduce a novel benchmark based on the IEEE Thesaurus for evaluating the task of identifying three types of semantic relations between pairs of topics: broader, narrower, and same-as. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models with varying architectures and sizes have achieved excellent results on this task, including Mixtral-8

\times

7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can achieve strong performance while requiring very limited computational resources.

查看原文本刊更多论文

学术本体生成的大型语言模型：工程领域的广泛分析

研究主题的本体对于构建科学知识，使科学家能够浏览大量的研究，以及形成智能系统（如搜索引擎和推荐系统）的支柱至关重要。然而，手工创建这些本体是昂贵的、缓慢的，并且经常导致过时和过于一般化的表示。作为解决方案，研究人员一直在研究自动化或半自动化生成这些本体过程的方法。该领域的关键挑战之一是准确评估研究主题对之间的语义关系。本文对大型语言模型（llm）识别这种关系的能力进行了分析，并特别关注工程领域。为此，我们引入了一个基于IEEE Thesaurus的新基准，用于评估识别主题对之间三种类型的语义关系的任务：更宽、更窄和相同。我们的研究评估了17个法学硕士的性能，它们在规模、可访问性（开放与专有）和模型类型（完整与量化）方面有所不同，同时还评估了四种零概率推理策略。Mixtral-8× 7B、Dolphin-Mistral-7B和Claude 3 Sonnet等不同架构和尺寸的模型在该任务上取得了优异的成绩，f1得分分别为0.847、0.920和0.967。此外，我们的研究结果表明，当通过快速工程进行优化时，更小的量化模型可以在需要非常有限的计算资源的情况下实现强大的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.