基于GAE嵌入式自编码器的因果表示学习

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-02-28 DOI:10.1109/TKDE.2025.3546607

Kuang Zhou;Ming Jiang;Bogdan Gabrys;Yong Xu

{"title":"基于GAE嵌入式自编码器的因果表示学习","authors":"Kuang Zhou;Ming Jiang;Bogdan Gabrys;Yong Xu","doi":"10.1109/TKDE.2025.3546607","DOIUrl":null,"url":null,"abstract":"Traditional machine-learning approaches face limitations when confronted with insufficient data. Transfer learning addresses this by leveraging knowledge from closely related domains. The key in transfer learning is to find a transferable feature representation to enhance cross-domain classification models. However, in some scenarios, some features correlated with samples in the source domain may not be relevant to those in the target. Causal inference enables us to uncover the underlying patterns and mechanisms within the data, mitigating the impact of confounding factors. Nevertheless, most existing causal inference algorithms have limitations when applied to high-dimensional datasets with nonlinear causal relationships. In this work, a new causal representation method based on a Graph autoencoder embedded AutoEncoder, named GeAE, is introduced to learn invariant representations across domains. The proposed approach employs a causal structure learning module, similar to a graph autoencoder, to account for nonlinear causal relationships present in the data. Moreover, the cross-entropy loss as well as the causal structure learning loss and the reconstruction loss are incorporated in the objective function designed in a united autoencoder. This method allows for the handling of high-dimensional data and can provide effective representations for cross-domain classification tasks. Experimental results on generated and real-world datasets demonstrate the effectiveness of GeAE compared with the state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3472-3484"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Causal Representations Based on a GAE Embedded Autoencoder\",\"authors\":\"Kuang Zhou;Ming Jiang;Bogdan Gabrys;Yong Xu\",\"doi\":\"10.1109/TKDE.2025.3546607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional machine-learning approaches face limitations when confronted with insufficient data. Transfer learning addresses this by leveraging knowledge from closely related domains. The key in transfer learning is to find a transferable feature representation to enhance cross-domain classification models. However, in some scenarios, some features correlated with samples in the source domain may not be relevant to those in the target. Causal inference enables us to uncover the underlying patterns and mechanisms within the data, mitigating the impact of confounding factors. Nevertheless, most existing causal inference algorithms have limitations when applied to high-dimensional datasets with nonlinear causal relationships. In this work, a new causal representation method based on a Graph autoencoder embedded AutoEncoder, named GeAE, is introduced to learn invariant representations across domains. The proposed approach employs a causal structure learning module, similar to a graph autoencoder, to account for nonlinear causal relationships present in the data. Moreover, the cross-entropy loss as well as the causal structure learning loss and the reconstruction loss are incorporated in the objective function designed in a united autoencoder. This method allows for the handling of high-dimensional data and can provide effective representations for cross-domain classification tasks. Experimental results on generated and real-world datasets demonstrate the effectiveness of GeAE compared with the state-of-the-art methods.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 6\",\"pages\":\"3472-3484\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10908047/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908047/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

传统的机器学习方法在面对数据不足时面临局限性。迁移学习通过利用来自密切相关领域的知识来解决这个问题。迁移学习的关键是找到一种可迁移的特征表示来增强跨域分类模型。然而，在某些情况下，与源域中样本相关的一些特征可能与目标域中的特征不相关。因果推理使我们能够揭示数据中的潜在模式和机制，减轻混淆因素的影响。然而，大多数现有的因果推理算法在应用于具有非线性因果关系的高维数据集时存在局限性。在这项工作中，引入了一种新的基于图自编码器嵌入式自编码器的因果表示方法，称为GeAE，用于学习跨域的不变表示。所提出的方法采用因果结构学习模块，类似于图形自编码器，以解释数据中存在的非线性因果关系。在此基础上，将交叉熵损失、因果结构学习损失和重构损失纳入统一自编码器的目标函数中。这种方法允许处理高维数据，并且可以为跨域分类任务提供有效的表示。在生成数据集和实际数据集上的实验结果表明，与最先进的方法相比，GeAE是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Causal Representations Based on a GAE Embedded Autoencoder

Traditional machine-learning approaches face limitations when confronted with insufficient data. Transfer learning addresses this by leveraging knowledge from closely related domains. The key in transfer learning is to find a transferable feature representation to enhance cross-domain classification models. However, in some scenarios, some features correlated with samples in the source domain may not be relevant to those in the target. Causal inference enables us to uncover the underlying patterns and mechanisms within the data, mitigating the impact of confounding factors. Nevertheless, most existing causal inference algorithms have limitations when applied to high-dimensional datasets with nonlinear causal relationships. In this work, a new causal representation method based on a Graph autoencoder embedded AutoEncoder, named GeAE, is introduced to learn invariant representations across domains. The proposed approach employs a causal structure learning module, similar to a graph autoencoder, to account for nonlinear causal relationships present in the data. Moreover, the cross-entropy loss as well as the causal structure learning loss and the reconstruction loss are incorporated in the objective function designed in a united autoencoder. This method allows for the handling of high-dimensional data and can provide effective representations for cross-domain classification tasks. Experimental results on generated and real-world datasets demonstrate the effectiveness of GeAE compared with the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.