Automated taxonomy alignment via large language models: bridging the gap between knowledge domains

IF 3.5 3区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Scientometrics Pub Date : 2024-07-26 DOI:10.1007/s11192-024-05111-2

Wentao Cui, Meng Xiao, Ludi Wang, Xuezhi Wang, Yi Du, Yuanchun Zhou

{"title":"Automated taxonomy alignment via large language models: bridging the gap between knowledge domains","authors":"Wentao Cui, Meng Xiao, Ludi Wang, Xuezhi Wang, Yi Du, Yuanchun Zhou","doi":"10.1007/s11192-024-05111-2","DOIUrl":null,"url":null,"abstract":"<p>Taxonomy alignment is essential for integrating knowledge across diverse domains and languages, facilitating information retrieval and data integration. Traditional methods heavily reliant on domain experts are time-consuming and resource-intensive. To address this challenge, this paper proposes an automated taxonomy alignment approach leveraging large language models (LLMs). We introduce a method that embeds taxonomy nodes into a continuous low-dimensional vector space, utilizing hierarchical relationships within category concepts to enhance alignment accuracy. Our approach capitalizes on the contextual understanding and semantic information capabilities of LLMs, offering a promising solution to the challenges of taxonomy alignment. We conducted experiments on two pairs of real-world taxonomies and demonstrated that our method is comparable in accuracy to manual alignment, while significantly reducing time, operational, and maintenance costs associated with taxonomy alignment. Our case study showcases the effectiveness of our approach by visualizing the taxonomy alignment results. This automated alignment framework addresses the increasing demand for accurate and efficient alignment processes across diverse knowledge domains.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"26 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05111-2","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Taxonomy alignment is essential for integrating knowledge across diverse domains and languages, facilitating information retrieval and data integration. Traditional methods heavily reliant on domain experts are time-consuming and resource-intensive. To address this challenge, this paper proposes an automated taxonomy alignment approach leveraging large language models (LLMs). We introduce a method that embeds taxonomy nodes into a continuous low-dimensional vector space, utilizing hierarchical relationships within category concepts to enhance alignment accuracy. Our approach capitalizes on the contextual understanding and semantic information capabilities of LLMs, offering a promising solution to the challenges of taxonomy alignment. We conducted experiments on two pairs of real-world taxonomies and demonstrated that our method is comparable in accuracy to manual alignment, while significantly reducing time, operational, and maintenance costs associated with taxonomy alignment. Our case study showcases the effectiveness of our approach by visualizing the taxonomy alignment results. This automated alignment framework addresses the increasing demand for accurate and efficient alignment processes across diverse knowledge domains.

Abstract Image

查看原文本刊更多论文

通过大型语言模型进行自动分类对齐：缩小知识领域之间的差距

分类标准对齐对于整合不同领域和语言的知识、促进信息检索和数据整合至关重要。严重依赖领域专家的传统方法既耗时又耗费资源。为了应对这一挑战，本文提出了一种利用大型语言模型（LLM）的自动分类法对齐方法。我们介绍了一种将分类法节点嵌入连续低维向量空间的方法，利用分类概念内的层次关系来提高对齐的准确性。我们的方法利用了 LLM 的上下文理解和语义信息能力，为解决分类法对齐难题提供了一个前景广阔的解决方案。我们在两对真实世界的分类法上进行了实验，结果表明我们的方法在准确性上与人工对齐不相上下，同时大大减少了与分类法对齐相关的时间、操作和维护成本。我们的案例研究通过可视化分类标准对齐结果，展示了我们方法的有效性。这一自动对齐框架满足了不同知识领域对准确、高效对齐流程日益增长的需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientometrics 管理科学-计算机：跨学科应用

CiteScore

7.20

自引率

17.90%

发文量

351

审稿时长

1.5 months

期刊介绍： Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods. The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories. Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.