用于扩展本体的框嵌入：一种数据驱动和可解释的方法

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2025-09-01 DOI:10.1186/s13321-025-01086-1

Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski

{"title":"用于扩展本体的框嵌入：一种数据驱动和可解释的方法","authors":"Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski","doi":"10.1186/s13321-025-01086-1","DOIUrl":null,"url":null,"abstract":"Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.Our proposed approach advances chemical classification by enabling\ninterpretable outputs through a structured and geometrically\nexpressive representation of molecules and their classes.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01086-1","citationCount":"0","resultStr":"{\"title\":\"Box embeddings for extending ontologies: a data-driven and interpretable approach\",\"authors\":\"Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski\",\"doi\":\"10.1186/s13321-025-01086-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.Our proposed approach advances chemical classification by enabling\\ninterpretable outputs through a structured and geometrically\\nexpressive representation of molecules and their classes.\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01086-1\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01086-1\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01086-1","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

由于这些模型缺乏透明度，从经过训练的深度学习模型中获得符号知识是具有挑战性的。解决这个问题的一个很有前途的方法是将语义结构与模型输出相耦合，从而使模型可解释。在多标签分类等预测任务中，标签倾向于形成层次关系。因此，我们建议在整个训练阶段对模型的输出实施分类结构。在向量空间中，分类法可以使用与轴对齐的超矩形或框来表示，它们可以相互重叠或嵌套。框的边界决定了特定类别的范围。因此，我们使用本体类的盒形嵌入来学习和透明地表示仅在多标签数据集中隐含的逻辑关系。我们通过测量其近似ChEBI本体中所有推断子类关系的能力来评估我们的模型，ChEBI本体是生命科学领域的一个重要知识库。我们证明了我们的模型捕获了标签之间的隐式层次关系，确保了与底层本体概念化的一致性，同时在多标签分类中也实现了最先进的性能。值得注意的是，这在训练过程中不需要显式分类法就可以完成。我们提出的方法通过分子及其类的结构化和几何表达表示实现可解释的输出，从而推进化学分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Box embeddings for extending ontologies: a data-driven and interpretable approach

Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.

Our proposed approach advances chemical classification by enabling interpretable outputs through a structured and geometrically expressive representation of molecules and their classes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.