Unlocking Large Language Model Power in Industry: Privacy-Preserving Collaborative Creation of Knowledge Graph

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-12-26 DOI:10.1109/TBDATA.2024.3522814

Liqiao Xia;Junming Fan;Ajith Parlikad;Xiao Huang;Pai Zheng

{"title":"Unlocking Large Language Model Power in Industry: Privacy-Preserving Collaborative Creation of Knowledge Graph","authors":"Liqiao Xia;Junming Fan;Ajith Parlikad;Xiao Huang;Pai Zheng","doi":"10.1109/TBDATA.2024.3522814","DOIUrl":null,"url":null,"abstract":"Semantic expertise remains a reliable foundation for industrial decision-making, while Large Language Models (LLMs) can augment the often limited empirical knowledge by generating domain-specific insights, though the quality of this generative knowledge is uncertain. Integrating LLMs with the collective wisdom of multiple stakeholders could enhance the quality and scale of knowledge, yet this integration might inadvertently raise privacy concerns for stakeholders. In response to this challenge, Federated Learning (FL) is harnessed to improve the knowledge base quality by cryptically leveraging other stakeholders’ knowledge, where knowledge base is represented in Knowledge Graph (KG) form. Initially, a multi-field hyperbolic (MFH) graph embedding method vectorizes entities, furnishing mathematical representations in lieu of solely semantic meanings. The FL framework subsequently encrypted identifies and fuses common entities, whereby the updated entities’ embedding can refine other private entities’ embedding locally, thus enhancing the overall KG quality. Finally, the KG complement method refines and clarifies triplets to improve the overall quality of the KG. An experiment assesses the proposed approach across different industrial KGs, confirming its effectiveness as a viable solution for collaborative KG creation, all while maintaining data security.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2046-2060"},"PeriodicalIF":5.7000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10816465/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic expertise remains a reliable foundation for industrial decision-making, while Large Language Models (LLMs) can augment the often limited empirical knowledge by generating domain-specific insights, though the quality of this generative knowledge is uncertain. Integrating LLMs with the collective wisdom of multiple stakeholders could enhance the quality and scale of knowledge, yet this integration might inadvertently raise privacy concerns for stakeholders. In response to this challenge, Federated Learning (FL) is harnessed to improve the knowledge base quality by cryptically leveraging other stakeholders’ knowledge, where knowledge base is represented in Knowledge Graph (KG) form. Initially, a multi-field hyperbolic (MFH) graph embedding method vectorizes entities, furnishing mathematical representations in lieu of solely semantic meanings. The FL framework subsequently encrypted identifies and fuses common entities, whereby the updated entities’ embedding can refine other private entities’ embedding locally, thus enhancing the overall KG quality. Finally, the KG complement method refines and clarifies triplets to improve the overall quality of the KG. An experiment assesses the proposed approach across different industrial KGs, confirming its effectiveness as a viable solution for collaborative KG creation, all while maintaining data security.

查看原文本刊更多论文

解锁工业中的大型语言模型力量：保护隐私的知识图谱协同创建

语义专业知识仍然是工业决策的可靠基础，而大型语言模型（llm）可以通过生成特定领域的见解来增加通常有限的经验知识，尽管这种生成知识的质量是不确定的。将法学硕士与多个利益相关者的集体智慧相结合可以提高知识的质量和规模，但这种整合可能会无意中引起利益相关者对隐私的担忧。为了应对这一挑战，联邦学习（FL）被用来通过隐式地利用其他涉众的知识来提高知识库的质量，其中知识库以知识图（KG）的形式表示。最初，多域双曲（MFH）图嵌入方法对实体进行矢量化，提供数学表示代替单纯的语义。随后，FL框架加密识别和融合公共实体，更新实体的嵌入可以在局部细化其他私有实体的嵌入，从而提高整体KG质量。最后，KG补体法对三联体进行细化和澄清，提高KG的整体质量。一项实验在不同的工业KG中评估了所提出的方法，证实了其作为协作KG创建的可行解决方案的有效性，同时保持了数据安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.