Jose L. Mellina-Andreu , Alejandro Cisterna-García , Juan A. Botía
{"title":"基于参考知识图的嵌入语言模型中维度的数据驱动解释","authors":"Jose L. Mellina-Andreu , Alejandro Cisterna-García , Juan A. Botía","doi":"10.1016/j.knosys.2025.114507","DOIUrl":null,"url":null,"abstract":"<div><div>As language models are used in more applications, a key problem has become clear: their numerical embeddings are hard to interpret because it is unclear how each part of the vector relates to real-world meanings in specific fields. The prevailing embedding methods are inadequate in their current state, as they are unable to effectively bridge the gap between mathematical representations and human-understandable knowledge structures. The present study proposes a novel framework that explicitly links ontology classes to specific embedding dimensions through a dual-component architecture combining a text encoder that produces the target embedding dimensions with domain knowledge graphs. The Area Under the Interpretability Curve (AUIC) metric is introduced as a means to systematically evaluate model-alignment with ontological concepts. The analysis reveals that targeted dimensional mapping enables direct interpretation of individual vector components through ontological terms. The practical applications of this framework are illustrated through case studies in biomedical contexts, demonstrating enhanced model transparency without compromising performance. This approach establishes a measurable pathway for reconciling statistical language representations with structured domain knowledge, particularly benefiting fields requiring precise concept alignment like biomedicine. The implementation is publicly available at: <span><span>https://github.com/Mellandd/DEIBO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114507"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-driven interpretation of dimensions in an embedding language model based on a reference knowledge graph\",\"authors\":\"Jose L. Mellina-Andreu , Alejandro Cisterna-García , Juan A. Botía\",\"doi\":\"10.1016/j.knosys.2025.114507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As language models are used in more applications, a key problem has become clear: their numerical embeddings are hard to interpret because it is unclear how each part of the vector relates to real-world meanings in specific fields. The prevailing embedding methods are inadequate in their current state, as they are unable to effectively bridge the gap between mathematical representations and human-understandable knowledge structures. The present study proposes a novel framework that explicitly links ontology classes to specific embedding dimensions through a dual-component architecture combining a text encoder that produces the target embedding dimensions with domain knowledge graphs. The Area Under the Interpretability Curve (AUIC) metric is introduced as a means to systematically evaluate model-alignment with ontological concepts. The analysis reveals that targeted dimensional mapping enables direct interpretation of individual vector components through ontological terms. The practical applications of this framework are illustrated through case studies in biomedical contexts, demonstrating enhanced model transparency without compromising performance. This approach establishes a measurable pathway for reconciling statistical language representations with structured domain knowledge, particularly benefiting fields requiring precise concept alignment like biomedicine. The implementation is publicly available at: <span><span>https://github.com/Mellandd/DEIBO</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114507\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015461\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015461","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Data-driven interpretation of dimensions in an embedding language model based on a reference knowledge graph
As language models are used in more applications, a key problem has become clear: their numerical embeddings are hard to interpret because it is unclear how each part of the vector relates to real-world meanings in specific fields. The prevailing embedding methods are inadequate in their current state, as they are unable to effectively bridge the gap between mathematical representations and human-understandable knowledge structures. The present study proposes a novel framework that explicitly links ontology classes to specific embedding dimensions through a dual-component architecture combining a text encoder that produces the target embedding dimensions with domain knowledge graphs. The Area Under the Interpretability Curve (AUIC) metric is introduced as a means to systematically evaluate model-alignment with ontological concepts. The analysis reveals that targeted dimensional mapping enables direct interpretation of individual vector components through ontological terms. The practical applications of this framework are illustrated through case studies in biomedical contexts, demonstrating enhanced model transparency without compromising performance. This approach establishes a measurable pathway for reconciling statistical language representations with structured domain knowledge, particularly benefiting fields requiring precise concept alignment like biomedicine. The implementation is publicly available at: https://github.com/Mellandd/DEIBO.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.