Bug Analysis in Jupyter Notebook Projects: An Empirical Study

IF 6.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Taijara Loiola de Santana, Paulo Anselmo da Mota Silveira Neto, Eduardo Santana de Almeida, Iftekhar Ahmed
{"title":"Bug Analysis in Jupyter Notebook Projects: An Empirical Study","authors":"Taijara Loiola de Santana, Paulo Anselmo da Mota Silveira Neto, Eduardo Santana de Almeida, Iftekhar Ahmed","doi":"10.1145/3641539","DOIUrl":null,"url":null,"abstract":"<p>Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, few studies were found to understand Jupyter development challenges from the practitioners’ point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, we analyzed 30,416 Stack Overflow posts, which gave us insights into bugs that practitioners face when developing Jupyter notebook projects. Next, we conducted nineteen interviews with data scientists to uncover more details about Jupyter bugs and to gain insight into Jupyter developers’ challenges. Finally, to validate the study results and proposed taxonomy, we conducted a survey with 91 data scientists. We also highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"115 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3641539","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, few studies were found to understand Jupyter development challenges from the practitioners’ point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, we analyzed 30,416 Stack Overflow posts, which gave us insights into bugs that practitioners face when developing Jupyter notebook projects. Next, we conducted nineteen interviews with data scientists to uncover more details about Jupyter bugs and to gain insight into Jupyter developers’ challenges. Finally, to validate the study results and proposed taxonomy, we conducted a survey with 91 data scientists. We also highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.

Jupyter Notebook 项目中的错误分析:实证研究
Jupyter 等计算笔记本已被数据科学家广泛采用,用于编写分析和可视化数据的代码。尽管 Jupyter 的应用和普及率越来越高,但很少有研究从从业人员的角度来了解 Jupyter 开发所面临的挑战。本文通过大规模实证调查,对 Jupyter 从业人员面临的错误和挑战进行了系统研究。我们从 105 个带有 Jupyter 笔记本代码的 GitHub 开源项目中挖掘了 14,740 次提交。接着,我们分析了 30,416 篇 Stack Overflow 帖子,从中了解了从业人员在开发 Jupyter 笔记本项目时遇到的 bug。接下来,我们对数据科学家进行了 19 次访谈,以揭示有关 Jupyter bug 的更多细节,并深入了解 Jupyter 开发人员面临的挑战。最后,为了验证研究结果和建议的分类法,我们对 91 名数据科学家进行了调查。我们还强调了错误类别、其根本原因以及 Jupyter 从业人员面临的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology 工程技术-计算机:软件工程
CiteScore
6.30
自引率
4.50%
发文量
164
审稿时长
>12 weeks
期刊介绍: Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信