Wentao Cui, Meng Xiao, Ludi Wang, Xuezhi Wang, Yi Du, Yuanchun Zhou
{"title":"Automated taxonomy alignment via large language models: bridging the gap between knowledge domains","authors":"Wentao Cui, Meng Xiao, Ludi Wang, Xuezhi Wang, Yi Du, Yuanchun Zhou","doi":"10.1007/s11192-024-05111-2","DOIUrl":null,"url":null,"abstract":"<p>Taxonomy alignment is essential for integrating knowledge across diverse domains and languages, facilitating information retrieval and data integration. Traditional methods heavily reliant on domain experts are time-consuming and resource-intensive. To address this challenge, this paper proposes an automated taxonomy alignment approach leveraging large language models (LLMs). We introduce a method that embeds taxonomy nodes into a continuous low-dimensional vector space, utilizing hierarchical relationships within category concepts to enhance alignment accuracy. Our approach capitalizes on the contextual understanding and semantic information capabilities of LLMs, offering a promising solution to the challenges of taxonomy alignment. We conducted experiments on two pairs of real-world taxonomies and demonstrated that our method is comparable in accuracy to manual alignment, while significantly reducing time, operational, and maintenance costs associated with taxonomy alignment. Our case study showcases the effectiveness of our approach by visualizing the taxonomy alignment results. This automated alignment framework addresses the increasing demand for accurate and efficient alignment processes across diverse knowledge domains.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"26 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05111-2","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Taxonomy alignment is essential for integrating knowledge across diverse domains and languages, facilitating information retrieval and data integration. Traditional methods heavily reliant on domain experts are time-consuming and resource-intensive. To address this challenge, this paper proposes an automated taxonomy alignment approach leveraging large language models (LLMs). We introduce a method that embeds taxonomy nodes into a continuous low-dimensional vector space, utilizing hierarchical relationships within category concepts to enhance alignment accuracy. Our approach capitalizes on the contextual understanding and semantic information capabilities of LLMs, offering a promising solution to the challenges of taxonomy alignment. We conducted experiments on two pairs of real-world taxonomies and demonstrated that our method is comparable in accuracy to manual alignment, while significantly reducing time, operational, and maintenance costs associated with taxonomy alignment. Our case study showcases the effectiveness of our approach by visualizing the taxonomy alignment results. This automated alignment framework addresses the increasing demand for accurate and efficient alignment processes across diverse knowledge domains.
期刊介绍:
Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods.
The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories.
Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.