保留结果的科学数据分析工作流输入减少

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3559558

Anh Duc Vu, Timo Kehrer, Christos Tsigkanos

{"title":"保留结果的科学数据分析工作流输入减少","authors":"Anh Duc Vu, Timo Kehrer, Christos Tsigkanos","doi":"10.1145/3551349.3559558","DOIUrl":null,"url":null,"abstract":"Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Outcome-Preserving Input Reduction for Scientific Data Analysis Workflows\",\"authors\":\"Anh Duc Vu, Timo Kehrer, Christos Tsigkanos\",\"doi\":\"10.1145/3551349.3559558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.\",\"PeriodicalId\":197939,\"journal\":{\"name\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3551349.3559558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3559558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

数据分析是多个科学学科的基础，表现在复杂多样的科学数据分析工作流程中，往往涉及探索性分析。这样的分析代表了传统数据工程工作流的一个特殊情况，因为结果可能很难解释和判断它们是否正确，并且实验是一个中心主题。通常，结果的某些方面是可疑的，应该进一步调查，以增加工作流结果的可信度。为此，我们提倡一种半自动化的方法来减少工作流的输入数据，同时保留指定的感兴趣的结果，通过缩小搜索空间来发现损坏的输入数据或对其做出的错误假设，从而促进不规则的本地化。我们概述了我们在数据分析工作流程中为结果保留输入减少建立工程支持的愿景，并报告了从数据科学家和机器学习从业者的在线社区中获取的计算笔记本上应用早期研究原型所获得的初步结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Outcome-Preserving Input Reduction for Scientific Data Analysis Workflows

Analysis of data is the foundation of multiple scientific disciplines, manifesting in complex and diverse scientific data analysis workflows often involving exploratory analyses. Such analyses represent a particular case for traditional data engineering workflows, as results may be hard to interpret and judge whether they are correct or not, and where experimentation is a central theme. Oftentimes, there are certain aspects of a result which are suspicious and which should be further investigated to increase the trustworthiness of the workflow’s outcome. To this end, we advocate a semi-automated approach to reducing a workflow’s input data while preserving a specified outcome of interest, facilitating irregularity localization by narrowing down the search space for spotting corrupted input data or wrong assumptions made about it. We outline our vision on building engineering support for outcome-preserving input reduction within data analysis workflows, and report on preliminary results obtained from applying an early research prototype on a computational notebook taken from an online community of data scientists and machine learning practitioners.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量