Jinlong Tian , Shixuan Liu , Ruochun Jin , Mengmeng Li , Yanfang Zhou , Xinhai Xu , Yuhua Tang
{"title":"基于自监督结构语义图自编码器的高效表嵌入","authors":"Jinlong Tian , Shixuan Liu , Ruochun Jin , Mengmeng Li , Yanfang Zhou , Xinhai Xu , Yuhua Tang","doi":"10.1016/j.ipm.2025.104298","DOIUrl":null,"url":null,"abstract":"<div><div>Representing tabular data effectively is difficult due to its structural complexity and semantic nuances. Existing models either inadequately capture these features or suffer from computational inefficiency. This paper presents TEA (Table Embedding Autoencoder), a self-supervised learning framework for tabular data embedding. TEA utilizes a Contextual Tabular Graph representation incorporating crucial table relationships and a specialized Table Graph Autoencoder (TGAE) with multi-facet reconstruction (feature/edge/degree). This design ensures efficient learning of comprehensive structural and semantic embeddings. On eight benchmark datasets, TEA surpasses SOTA tabular models and LLMs, achieving average F-measure improvements of 14 (entity resolution). Crucially, TEA is 4x more computationally efficient than the SOTA model, facilitating large-scale data processing.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104298"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Table Embeddings via Self-Supervised Structural-Semantic Graph Autoencoder\",\"authors\":\"Jinlong Tian , Shixuan Liu , Ruochun Jin , Mengmeng Li , Yanfang Zhou , Xinhai Xu , Yuhua Tang\",\"doi\":\"10.1016/j.ipm.2025.104298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Representing tabular data effectively is difficult due to its structural complexity and semantic nuances. Existing models either inadequately capture these features or suffer from computational inefficiency. This paper presents TEA (Table Embedding Autoencoder), a self-supervised learning framework for tabular data embedding. TEA utilizes a Contextual Tabular Graph representation incorporating crucial table relationships and a specialized Table Graph Autoencoder (TGAE) with multi-facet reconstruction (feature/edge/degree). This design ensures efficient learning of comprehensive structural and semantic embeddings. On eight benchmark datasets, TEA surpasses SOTA tabular models and LLMs, achieving average F-measure improvements of 14 (entity resolution). Crucially, TEA is 4x more computationally efficient than the SOTA model, facilitating large-scale data processing.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 1\",\"pages\":\"Article 104298\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002390\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002390","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Efficient Table Embeddings via Self-Supervised Structural-Semantic Graph Autoencoder
Representing tabular data effectively is difficult due to its structural complexity and semantic nuances. Existing models either inadequately capture these features or suffer from computational inefficiency. This paper presents TEA (Table Embedding Autoencoder), a self-supervised learning framework for tabular data embedding. TEA utilizes a Contextual Tabular Graph representation incorporating crucial table relationships and a specialized Table Graph Autoencoder (TGAE) with multi-facet reconstruction (feature/edge/degree). This design ensures efficient learning of comprehensive structural and semantic embeddings. On eight benchmark datasets, TEA surpasses SOTA tabular models and LLMs, achieving average F-measure improvements of 14 (entity resolution). Crucially, TEA is 4x more computationally efficient than the SOTA model, facilitating large-scale data processing.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.