FlashRelate: extracting relational data from semi-structured spreadsheets using examples

Daniel W. Barowy, Sumit Gulwani, Ted Hart, B. Zorn
{"title":"FlashRelate: extracting relational data from semi-structured spreadsheets using examples","authors":"Daniel W. Barowy, Sumit Gulwani, Ted Hart, B. Zorn","doi":"10.1145/2737924.2737952","DOIUrl":null,"url":null,"abstract":"With hundreds of millions of users, spreadsheets are one of the most important end-user applications. Spreadsheets are easy to use and allow users great flexibility in storing data. This flexibility comes at a price: users often treat spreadsheets as a poor man's database, leading to creative solutions for storing high-dimensional data. The trouble arises when users need to answer queries with their data. Data manipulation tools make strong assumptions about data layouts and cannot read these ad-hoc databases. Converting data into the appropriate layout requires programming skills or a major investment in manual reformatting. The effect is that a vast amount of real-world data is \"locked-in\" to a proliferation of one-off formats. We introduce FlashRelate, a synthesis engine that lets ordinary users extract structured relational data from spreadsheets without programming. Instead, users extract data by supplying examples of output relational tuples. FlashRelate uses these examples to synthesize a program in Flare. Flare is a novel extraction language that extends regular expressions with geometric constructs. An interactive user interface on top of FlashRelate lets end users extract data by point-and-click. We demonstrate that correct Flare programs can be synthesized in seconds from a small set of examples for 43 real-world scenarios. Finally, our case study demonstrates FlashRelate's usefulness addressing the widespread problem of data trapped in corporate and government formats.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"123","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2737924.2737952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 123

Abstract

With hundreds of millions of users, spreadsheets are one of the most important end-user applications. Spreadsheets are easy to use and allow users great flexibility in storing data. This flexibility comes at a price: users often treat spreadsheets as a poor man's database, leading to creative solutions for storing high-dimensional data. The trouble arises when users need to answer queries with their data. Data manipulation tools make strong assumptions about data layouts and cannot read these ad-hoc databases. Converting data into the appropriate layout requires programming skills or a major investment in manual reformatting. The effect is that a vast amount of real-world data is "locked-in" to a proliferation of one-off formats. We introduce FlashRelate, a synthesis engine that lets ordinary users extract structured relational data from spreadsheets without programming. Instead, users extract data by supplying examples of output relational tuples. FlashRelate uses these examples to synthesize a program in Flare. Flare is a novel extraction language that extends regular expressions with geometric constructs. An interactive user interface on top of FlashRelate lets end users extract data by point-and-click. We demonstrate that correct Flare programs can be synthesized in seconds from a small set of examples for 43 real-world scenarios. Finally, our case study demonstrates FlashRelate's usefulness addressing the widespread problem of data trapped in corporate and government formats.
FlashRelate:使用示例从半结构化电子表格中提取关系数据
拥有数亿用户的电子表格是最重要的终端用户应用程序之一。电子表格易于使用,并允许用户在存储数据方面具有很大的灵活性。这种灵活性是有代价的:用户通常将电子表格视为穷人的数据库,从而导致存储高维数据的创造性解决方案。当用户需要用他们的数据回答查询时,问题就出现了。数据操作工具对数据布局有很强的假设,不能读取这些临时数据库。将数据转换为适当的布局需要编程技能,或者在手动重新格式化方面进行大量投资。其结果是,大量真实世界的数据被“锁定”为大量的一次性格式。我们介绍FlashRelate,这是一个合成引擎,普通用户无需编程即可从电子表格中提取结构化关系数据。相反,用户通过提供输出关系元组的示例来提取数据。FlashRelate使用这些例子在Flare中合成一个程序。Flare是一种新颖的提取语言,它用几何结构扩展正则表达式。FlashRelate之上的交互式用户界面允许最终用户通过点击来提取数据。我们证明了正确的耀斑程序可以从43个真实场景的一小组示例中在几秒钟内合成。最后,我们的案例研究展示了FlashRelate在解决企业和政府格式中数据被困的普遍问题方面的有用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信