{"title":"Unlocking Large Language Model Power in Industry: Privacy-Preserving Collaborative Creation of Knowledge Graph","authors":"Liqiao Xia;Junming Fan;Ajith Parlikad;Xiao Huang;Pai Zheng","doi":"10.1109/TBDATA.2024.3522814","DOIUrl":null,"url":null,"abstract":"Semantic expertise remains a reliable foundation for industrial decision-making, while Large Language Models (LLMs) can augment the often limited empirical knowledge by generating domain-specific insights, though the quality of this generative knowledge is uncertain. Integrating LLMs with the collective wisdom of multiple stakeholders could enhance the quality and scale of knowledge, yet this integration might inadvertently raise privacy concerns for stakeholders. In response to this challenge, Federated Learning (FL) is harnessed to improve the knowledge base quality by cryptically leveraging other stakeholders’ knowledge, where knowledge base is represented in Knowledge Graph (KG) form. Initially, a multi-field hyperbolic (MFH) graph embedding method vectorizes entities, furnishing mathematical representations in lieu of solely semantic meanings. The FL framework subsequently encrypted identifies and fuses common entities, whereby the updated entities’ embedding can refine other private entities’ embedding locally, thus enhancing the overall KG quality. Finally, the KG complement method refines and clarifies triplets to improve the overall quality of the KG. An experiment assesses the proposed approach across different industrial KGs, confirming its effectiveness as a viable solution for collaborative KG creation, all while maintaining data security.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2046-2060"},"PeriodicalIF":5.7000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10816465/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic expertise remains a reliable foundation for industrial decision-making, while Large Language Models (LLMs) can augment the often limited empirical knowledge by generating domain-specific insights, though the quality of this generative knowledge is uncertain. Integrating LLMs with the collective wisdom of multiple stakeholders could enhance the quality and scale of knowledge, yet this integration might inadvertently raise privacy concerns for stakeholders. In response to this challenge, Federated Learning (FL) is harnessed to improve the knowledge base quality by cryptically leveraging other stakeholders’ knowledge, where knowledge base is represented in Knowledge Graph (KG) form. Initially, a multi-field hyperbolic (MFH) graph embedding method vectorizes entities, furnishing mathematical representations in lieu of solely semantic meanings. The FL framework subsequently encrypted identifies and fuses common entities, whereby the updated entities’ embedding can refine other private entities’ embedding locally, thus enhancing the overall KG quality. Finally, the KG complement method refines and clarifies triplets to improve the overall quality of the KG. An experiment assesses the proposed approach across different industrial KGs, confirming its effectiveness as a viable solution for collaborative KG creation, all while maintaining data security.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.