保留结果的科学数据分析工作流输入减少

Anh Duc Vu, Timo Kehrer, Christos Tsigkanos
{"title":"保留结果的科学数据分析工作流输入减少","authors":"Anh Duc Vu, Timo Kehrer, Christos Tsigkanos","doi":"10.1145/3551349.3559558","DOIUrl":null,"url":null,"abstract":"Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Outcome-Preserving Input Reduction for Scientific Data Analysis Workflows\",\"authors\":\"Anh Duc Vu, Timo Kehrer, Christos Tsigkanos\",\"doi\":\"10.1145/3551349.3559558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.\",\"PeriodicalId\":197939,\"journal\":{\"name\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3551349.3559558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3559558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

数据分析是多个科学学科的基础,表现在复杂多样的科学数据分析工作流程中,往往涉及探索性分析。这样的分析代表了传统数据工程工作流的一个特殊情况,因为结果可能很难解释和判断它们是否正确,并且实验是一个中心主题。通常,结果的某些方面是可疑的,应该进一步调查,以增加工作流结果的可信度。为此,我们提倡一种半自动化的方法来减少工作流的输入数据,同时保留指定的感兴趣的结果,通过缩小搜索空间来发现损坏的输入数据或对其做出的错误假设,从而促进不规则的本地化。我们概述了我们在数据分析工作流程中为结果保留输入减少建立工程支持的愿景,并报告了从数据科学家和机器学习从业者的在线社区中获取的计算笔记本上应用早期研究原型所获得的初步结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Outcome-Preserving Input Reduction for Scientific Data Analysis Workflows
Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信