Expandable Group Identification in Spreadsheets

Wensheng Dou, Shi Han, Liang Xu, D. Zhang, Jun Wei
{"title":"Expandable Group Identification in Spreadsheets","authors":"Wensheng Dou, Shi Han, Liang Xu, D. Zhang, Jun Wei","doi":"10.1145/3238147.3238222","DOIUrl":null,"url":null,"abstract":"Spreadsheets are widely used in various business tasks. Spreadsheet users may put similar data and computations by repeating a block of cells (a unit) in their spreadsheets. We name the unit and all its expanding ones as an expandable group. All units in an expandable group share the same or similar formats and semantics. As a data storage and management tool, expandable groups represent the fundamental structure in spreadsheets. However, existing spreadsheet systems do not recognize any expandable groups. Therefore, other spreadsheet analysis tools, e.g., data integration and fault detection, cannot utilize this structure of expandable groups to perform precise analysis. In this paper, we propose ExpCheck to automatically extract expandable groups in spreadsheets. We observe that continuous units that share the similar formats and semantics are likely to be an expandable group. Inspired by this, we inspect the format of each cell and its corresponding semantics, and further classify them into expandable groups according to their similarity. We evaluate ExpCheck on 120 spreadsheets randomly sampled from the EUSES and VEnron corpora. The experimental results show that ExpCheck is effective. ExpCheck successfully detect expandable groups with F1-measure of 73.1%, significantly outperforming the state-of-the-art techniques (F1-measure of 13.3%).","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"38 1","pages":"498-508"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3238147.3238222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Spreadsheets are widely used in various business tasks. Spreadsheet users may put similar data and computations by repeating a block of cells (a unit) in their spreadsheets. We name the unit and all its expanding ones as an expandable group. All units in an expandable group share the same or similar formats and semantics. As a data storage and management tool, expandable groups represent the fundamental structure in spreadsheets. However, existing spreadsheet systems do not recognize any expandable groups. Therefore, other spreadsheet analysis tools, e.g., data integration and fault detection, cannot utilize this structure of expandable groups to perform precise analysis. In this paper, we propose ExpCheck to automatically extract expandable groups in spreadsheets. We observe that continuous units that share the similar formats and semantics are likely to be an expandable group. Inspired by this, we inspect the format of each cell and its corresponding semantics, and further classify them into expandable groups according to their similarity. We evaluate ExpCheck on 120 spreadsheets randomly sampled from the EUSES and VEnron corpora. The experimental results show that ExpCheck is effective. ExpCheck successfully detect expandable groups with F1-measure of 73.1%, significantly outperforming the state-of-the-art techniques (F1-measure of 13.3%).
电子表格中可扩展的组标识
电子表格广泛用于各种业务任务。电子表格用户可以通过在他们的电子表格中重复一个单元格块(一个单位)来放置类似的数据和计算。我们将该单元及其所有扩展单元命名为可扩展组。可扩展组中的所有单元共享相同或相似的格式和语义。作为一种数据存储和管理工具,可扩展组代表了电子表格中的基本结构。但是,现有的电子表格系统不能识别任何可扩展组。因此,其他电子表格分析工具,例如数据集成和故障检测,不能利用这种可扩展组的结构来执行精确的分析。在本文中,我们提出了ExpCheck来自动提取电子表格中的可扩展组。我们观察到,具有相似格式和语义的连续单元很可能是一个可扩展的群。受此启发,我们考察了每个单元格的格式及其对应的语义,并根据它们的相似性将它们进一步分类为可扩展的组。我们在从EUSES和VEnron语料库中随机抽取的120个电子表格上评估ExpCheck。实验结果表明,ExpCheck是有效的。ExpCheck以73.1%的f1测量值成功检测出可扩展组,显著优于目前最先进的技术(f1测量值为13.3%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信