Knowledge Enhanced Industrial Question-Answering Using Large Language Models

IF 11.6 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Engineering Pub Date : 2025-08-12 DOI:10.1016/j.eng.2025.07.035

Ronghui Liu, Hao Ren, Haojie Ren, Wu Rui, Wei Cui, Xiaojun Liang, Chunhua Yang, Weihua Gui

{"title":"Knowledge Enhanced Industrial Question-Answering Using Large Language Models","authors":"Ronghui Liu, Hao Ren, Haojie Ren, Wu Rui, Wei Cui, Xiaojun Liang, Chunhua Yang, Weihua Gui","doi":"10.1016/j.eng.2025.07.035","DOIUrl":null,"url":null,"abstract":"Modern industrial systems have grown increasingly extensive, complex, and hierarchical, with operations relying on numerous knowledge-based queries. These queries necessitate considerable human resources while also requiring high levels of accuracy, subjectivity, and consistency, all of which critically influence operational efficiency. To overcome these challenges, this study proposes an industrial retrieval-augmented generation (RAG) method designed to enhance large language models (LLMs) using domain-specific knowledge, thereby improving the precision of question answering. A comprehensive industrial knowledge base was constructed from diverse sources, including journal articles, theses, books, and patents. A Text classification model based on bidirectional encoder representations from transformers (BERTs) was trained to accurately classify incoming queries. Furthermore, the general text embedding–dense passage retrieval (GTE–DPR) model was employed to perform word embedding and vector similarity retrieval, facilitating the alignment of query vectors with relevant entries in the knowledge base to obtain initial responses. These initial results were subsequently refined by LLMs to produce accurate final answers. Experimental evaluations confirm the effectiveness of the proposed approach. In particular, when applied to ChatGLM2-6B, the RAG method increased the ROUGE-L score from 32.52% to 55.04% and improved accuracy from 50.52% to 73.92%. Comparable improvements were also observed with LLaMA2-7B, underscoring the RAG framework’s capability to significantly enhance the accuracy and relevance of industrial question-answering (QA) systems.","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"38 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.eng.2025.07.035","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Modern industrial systems have grown increasingly extensive, complex, and hierarchical, with operations relying on numerous knowledge-based queries. These queries necessitate considerable human resources while also requiring high levels of accuracy, subjectivity, and consistency, all of which critically influence operational efficiency. To overcome these challenges, this study proposes an industrial retrieval-augmented generation (RAG) method designed to enhance large language models (LLMs) using domain-specific knowledge, thereby improving the precision of question answering. A comprehensive industrial knowledge base was constructed from diverse sources, including journal articles, theses, books, and patents. A Text classification model based on bidirectional encoder representations from transformers (BERTs) was trained to accurately classify incoming queries. Furthermore, the general text embedding–dense passage retrieval (GTE–DPR) model was employed to perform word embedding and vector similarity retrieval, facilitating the alignment of query vectors with relevant entries in the knowledge base to obtain initial responses. These initial results were subsequently refined by LLMs to produce accurate final answers. Experimental evaluations confirm the effectiveness of the proposed approach. In particular, when applied to ChatGLM2-6B, the RAG method increased the ROUGE-L score from 32.52% to 55.04% and improved accuracy from 50.52% to 73.92%. Comparable improvements were also observed with LLaMA2-7B, underscoring the RAG framework’s capability to significantly enhance the accuracy and relevance of industrial question-answering (QA) systems.

查看原文本刊更多论文

使用大型语言模型的知识增强工业问答

现代工业系统已经变得越来越广泛、复杂和分层，其操作依赖于大量基于知识的查询。这些查询需要大量的人力资源，同时还需要高度的准确性、主观性和一致性，所有这些都严重影响操作效率。为了克服这些挑战，本研究提出了一种工业检索增强生成（RAG）方法，旨在利用领域特定知识增强大型语言模型（llm），从而提高问题回答的精度。从不同的来源，包括期刊文章、论文、书籍和专利，构建了一个全面的工业知识库。本文训练了一个基于双向编码器表示的文本分类模型来对输入查询进行准确分类。此外，采用通用文本嵌入-密集通道检索（GTE-DPR）模型进行词嵌入和向量相似度检索，便于查询向量与知识库中的相关条目对齐，从而获得初始响应。这些最初的结果随后由法学硕士提炼，以产生准确的最终答案。实验验证了该方法的有效性。特别是在ChatGLM2-6B上，RAG方法将ROUGE-L评分从32.52%提高到55.04%，准确率从50.52%提高到73.92%。LLaMA2-7B也观察到类似的改进，强调RAG框架显著提高工业问答（QA）系统的准确性和相关性的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Environmental Science-Environmental Engineering

自引率

1.60%

发文量

335

审稿时长

35 days

期刊介绍： Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.