ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449980

Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Ji Zhang, Jun Gao

{"title":"ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases","authors":"Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Ji Zhang, Jun Gao","doi":"10.1145/3442381.3449980","DOIUrl":null,"url":null,"abstract":"A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge.

查看原文本刊更多论文

关系型数据库自动学习的自动表连接网络

关系数据库由多个表组成，提供跨各种实体的异构信息，广泛用于实际服务中。本文研究了多表的监督学习任务，旨在利用多表数据预测一个标签列。然而，经典的ML技术主要关注单表数据。多表数据是指可连接属性和n元关系之间的多对多映射，这是经典ML技术无法直接利用的。此外，目前的图技术，如异构信息网络(HIN)和图神经网络(GNN)，都无法在多表环境下直接自动部署，这限制了对数据库的学习。对于关系型数据库的自动学习，我们提出了一个自动表连接网络(ATJ-Net)。具有关系的多个表被视为一个超图，其中顶点是可连接的属性，超边是表的元组。然后，ATJ-Net在异构超图上构建图神经网络，对n跳子图上的顶点和超边进行采样和聚合，作为接收场。为了使ATJ-Net能够自动部署到不同的数据集上，避免“没有免费的午餐”的困境，我们使用随机架构搜索来选择最优聚合器，并修剪网络中的冗余路径。为了验证我们的方法在不同任务和模式下的有效性，我们在4个任务、8种不同模式和19个子数据集上进行了广泛的实验，引用了预测、评论分类、推荐和任务盲挑战。ATJ-Net在三个任务上实现了最先进的性能，在任务盲挑战上与KddCup Winner解决方案竞争。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量