覆盖电子表格

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2023-06-18 DOI:10.1145/3597465.3605220

Oliver Kennedy, Boris Glavic, Mike Brachmann

{"title":"覆盖电子表格","authors":"Oliver Kennedy, Boris Glavic, Mike Brachmann","doi":"10.1145/3597465.3605220","DOIUrl":null,"url":null,"abstract":"Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact \"patterns\" that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"83 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overlay Spreadsheets\",\"authors\":\"Oliver Kennedy, Boris Glavic, Mike Brachmann\",\"doi\":\"10.1145/3597465.3605220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact \\\"patterns\\\" that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.\",\"PeriodicalId\":92279,\"journal\":{\"name\":\"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)\",\"volume\":\"83 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3597465.3605220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3597465.3605220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

扩展电子表格的努力要么遵循一种“虚拟”策略，即在现有数据库引擎上分层电子表格界面，要么遵循一种基于重新设计电子表格引擎的“物化”策略。由于数据库没有针对电子表格访问模式进行优化，因此物化方法具有更好的性能。然而，虚拟方法提供了一些物化方法无法轻易复制的优势，包括将用户交互重新应用于更新的输入数据集的能力。我们提出了覆盖更新模型，这是一种混合方法，将用户更新覆盖在现有数据集上(如虚拟方法)并索引用户更新(如物化方法)。我们方法的一个关键特性是将批量操作(例如，复制/粘贴)生成的更新存储为紧凑的“模式”，可以利用它来降低执行成本。我们在Apache Spark上实现了一个覆盖电子表格，并证明，与DataSpread(物化电子表格)相比，它可以显著降低执行成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Overlay Spreadsheets

Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact "patterns" that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)

自引率

0.00%

发文量