Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Proc. VLDB Endow. Pub Date : 2023-07-01 DOI:10.48550/arXiv.2307.14565

Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chauduri

{"title":"Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples","authors":"Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chauduri","doi":"10.48550/arXiv.2307.14565","DOIUrl":null,"url":null,"abstract":"Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables \"in the wild\". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums.\n We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"27 1","pages":"3391-3403"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.14565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

查看原文本刊更多论文

自动表:合成多步骤转换来关系表，而不使用示例

关系表中，每一行对应一个实体，每一列对应一个属性，这一直是关系数据库中表的标准。然而，当处理“野外”的表时，这样的标准不能被认为是理所当然的。我们对真实的电子表格和web表的调查显示，超过30%的此类表不符合关系标准，因此在使用基于sql的工具轻松查询这些表之前，需要进行复杂的表重构转换。不幸的是，所需的转换对于程序来说是非常重要的，这已经成为技术和非技术用户的一个重大痛点，正如在StackOverflow和Excel/Tableau论坛上的大量论坛问题所证明的那样。我们开发了一个Auto-Tables系统，该系统可以自动合成具有多步骤转换的管道(在Python或其他语言中)，将非关系表转换为下游分析的标准关系形式，从而避免了用户手动编程转换的需要。通过从用户电子表格和在线论坛中收集244个真实测试案例，我们为这项新任务编写了一个广泛的基准测试。我们的评估表明，Auto-Tables可以以交互速度成功地为超过70%的测试用例合成转换，而不需要用户的任何输入，这使得它成为技术和非技术用户为分析准备数据的有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量