Enhancing semantic search using ontologies: A hybrid information retrieval approach for industrial text

IF 10.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Syed Meesam Raza Naqvi , Mohammad Ghufran , Christophe Varnier , Jean-Marc Nicod , Noureddine Zerhouni
{"title":"Enhancing semantic search using ontologies: A hybrid information retrieval approach for industrial text","authors":"Syed Meesam Raza Naqvi ,&nbsp;Mohammad Ghufran ,&nbsp;Christophe Varnier ,&nbsp;Jean-Marc Nicod ,&nbsp;Noureddine Zerhouni","doi":"10.1016/j.jii.2025.100835","DOIUrl":null,"url":null,"abstract":"<div><div>Despite the increased focus on data in Industry 4.0, textual data has received little attention in the production and engineering management literature. Data sources such as maintenance records and machine documentation usually are not used to help maintenance decision-making. Available studies mainly focus on categorizing maintenance records or extracting meta-data, such as time of failure, maintenance cost, etc. One of the main reasons behind this underutilization is the complexity and unstructured nature of the industrial text. In this study, we propose a novel hybrid information retrieval approach for industrial text using multi-modal learning. Maintenance operators can use the proposed system to query maintenance records and find similar solutions to a given problem. The proposed system utilizes heterogeneous (multi-modal) data, a combination of maintenance records, and machine ontology to enhance semantic search results. We used the state-of-the-art Large Language Models (LLMs); BERT (Bidirectional Encoder Representations from Transformers) for textual similarity. For similarity among ontology labels, we used a modified version of Wu-Palmer’s similarity. A hybrid weighted similarity is proposed, incorporating text and ontology similarities to enhance semantic search results. The proposed approach was validated using an open-source dataset of real maintenance records from excavators collected over ten years from different mining sites. A retrieval comparison using only text and multi-modal data is performed to estimate the proposed system’s effectiveness. Quantitative and qualitative analysis of results indicates a performance improvement of 8% using the proposed hybrid similarity approach compared to only text-based retrieval. To the best of our knowledge, this is the first study to combine LLMs and machine ontology for semantic search in maintenance records.</div></div>","PeriodicalId":55975,"journal":{"name":"Journal of Industrial Information Integration","volume":"45 ","pages":"Article 100835"},"PeriodicalIF":10.4000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Industrial Information Integration","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452414X25000597","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the increased focus on data in Industry 4.0, textual data has received little attention in the production and engineering management literature. Data sources such as maintenance records and machine documentation usually are not used to help maintenance decision-making. Available studies mainly focus on categorizing maintenance records or extracting meta-data, such as time of failure, maintenance cost, etc. One of the main reasons behind this underutilization is the complexity and unstructured nature of the industrial text. In this study, we propose a novel hybrid information retrieval approach for industrial text using multi-modal learning. Maintenance operators can use the proposed system to query maintenance records and find similar solutions to a given problem. The proposed system utilizes heterogeneous (multi-modal) data, a combination of maintenance records, and machine ontology to enhance semantic search results. We used the state-of-the-art Large Language Models (LLMs); BERT (Bidirectional Encoder Representations from Transformers) for textual similarity. For similarity among ontology labels, we used a modified version of Wu-Palmer’s similarity. A hybrid weighted similarity is proposed, incorporating text and ontology similarities to enhance semantic search results. The proposed approach was validated using an open-source dataset of real maintenance records from excavators collected over ten years from different mining sites. A retrieval comparison using only text and multi-modal data is performed to estimate the proposed system’s effectiveness. Quantitative and qualitative analysis of results indicates a performance improvement of 8% using the proposed hybrid similarity approach compared to only text-based retrieval. To the best of our knowledge, this is the first study to combine LLMs and machine ontology for semantic search in maintenance records.
使用本体增强语义搜索:工业文本的混合信息检索方法
尽管工业4.0时代越来越关注数据,但文本数据在生产和工程管理文献中很少受到关注。诸如维护记录和机器文档之类的数据源通常不用于帮助维护决策。现有的研究主要集中在对维修记录进行分类或提取元数据,如故障时间、维修成本等。这种未充分利用背后的主要原因之一是工业文本的复杂性和非结构化性质。在本研究中,我们提出了一种基于多模态学习的工业文本混合信息检索方法。维修操作员可以使用提出的系统来查询维修记录,并找到针对给定问题的类似解决方案。该系统利用异构(多模态)数据、维护记录和机器本体的组合来增强语义搜索结果。我们使用了最先进的大型语言模型(llm);BERT(双向编码器表示从变压器)文本相似度。对于本体标签之间的相似度,我们使用了Wu-Palmer相似度的改进版本。提出了一种结合文本和本体相似度的混合加权相似度来增强语义搜索结果。采用开源数据集对所提出的方法进行了验证,该数据集是十多年来从不同矿区收集的挖掘机的真实维护记录。仅使用文本和多模态数据进行检索比较,以评估所提出系统的有效性。定量和定性分析结果表明,与仅基于文本的检索相比,使用所提出的混合相似度方法的性能提高了8%。据我们所知,这是第一个将llm和机器本体结合起来进行维护记录语义搜索的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Industrial Information Integration
Journal of Industrial Information Integration Decision Sciences-Information Systems and Management
CiteScore
22.30
自引率
13.40%
发文量
100
期刊介绍: The Journal of Industrial Information Integration focuses on the industry's transition towards industrial integration and informatization, covering not only hardware and software but also information integration. It serves as a platform for promoting advances in industrial information integration, addressing challenges, issues, and solutions in an interdisciplinary forum for researchers, practitioners, and policy makers. The Journal of Industrial Information Integration welcomes papers on foundational, technical, and practical aspects of industrial information integration, emphasizing the complex and cross-disciplinary topics that arise in industrial integration. Techniques from mathematical science, computer science, computer engineering, electrical and electronic engineering, manufacturing engineering, and engineering management are crucial in this context.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信