ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases

Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Ji Zhang, Jun Gao
{"title":"ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases","authors":"Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Ji Zhang, Jun Gao","doi":"10.1145/3442381.3449980","DOIUrl":null,"url":null,"abstract":"A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge.
关系型数据库自动学习的自动表连接网络
关系数据库由多个表组成,提供跨各种实体的异构信息,广泛用于实际服务中。本文研究了多表的监督学习任务,旨在利用多表数据预测一个标签列。然而,经典的ML技术主要关注单表数据。多表数据是指可连接属性和n元关系之间的多对多映射,这是经典ML技术无法直接利用的。此外,目前的图技术,如异构信息网络(HIN)和图神经网络(GNN),都无法在多表环境下直接自动部署,这限制了对数据库的学习。对于关系型数据库的自动学习,我们提出了一个自动表连接网络(ATJ-Net)。具有关系的多个表被视为一个超图,其中顶点是可连接的属性,超边是表的元组。然后,ATJ-Net在异构超图上构建图神经网络,对n跳子图上的顶点和超边进行采样和聚合,作为接收场。为了使ATJ-Net能够自动部署到不同的数据集上,避免“没有免费的午餐”的困境,我们使用随机架构搜索来选择最优聚合器,并修剪网络中的冗余路径。为了验证我们的方法在不同任务和模式下的有效性,我们在4个任务、8种不同模式和19个子数据集上进行了广泛的实验,引用了预测、评论分类、推荐和任务盲挑战。ATJ-Net在三个任务上实现了最先进的性能,在任务盲挑战上与KddCup Winner解决方案竞争。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信