Error Identification Strategies for Python Jupyter Notebooks

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) Pub Date : 2022-03-30 DOI:10.1145/3524610.3529156

Derek Robinson, Neil A. Ernst, Enrique Larios Vargas, M. Storey

{"title":"Error Identification Strategies for Python Jupyter Notebooks","authors":"Derek Robinson, Neil A. Ernst, Enrique Larios Vargas, M. Storey","doi":"10.1145/3524610.3529156","DOIUrl":null,"url":null,"abstract":"Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3524610.3529156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

查看原文本刊更多论文

Python Jupyter笔记本的错误识别策略

计算型笔记本——比如Jupyter或colab——结合了文本和数据分析代码。它们在数据科学和探索性数据分析领域已经无处不在。由于这些笔记本提供了与传统的ide驱动编程不同的编程范例，因此在计算型笔记本中进行调试也可能是不同的。更具体地说，由于创建笔记本混合了领域知识、统计分析和编程，因此笔记本用户发现和修复这些不同形式错误的方式可能不同。在本文中，我们对Python Jupyter笔记本用户如何发现和理解笔记本中的潜在错误进行了探索性观察研究。通过对研究设计的概念复制，我们调查了R笔记本用户的错误识别策略，我们向用户提供了预先填充了常见笔记本错误的Python Jupyter笔记本，这些错误源于统计数据分析、领域概念知识或编程。然后，我们分析了研究参与者用来发现这些错误的策略，并确定每种策略在识别错误方面的成功程度。我们的研究结果表明，虽然笔记本编程环境与传统编程环境不同，但调试策略仍然非常相似。我们希望本文中提出的见解能够帮助笔记本工具的设计者和教育者做出改变，以改进数据科学家在他们写的笔记本中更容易发现错误的方式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量