AI support for data scientists: An empirical study on workflow and alternative code recommendations.

IF 3.6 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI:10.1007/s10664-025-10622-4

Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein

{"title":"AI support for data scientists: An empirical study on workflow and alternative code recommendations.","authors":"Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein","doi":"10.1007/s10664-025-10622-4","DOIUrl":null,"url":null,"abstract":"Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.Supplementary information: The online version contains supplementary material available at 10.1007/s10664-025-10622-4.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"133"},"PeriodicalIF":3.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227384/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-025-10622-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.

Supplementary information: The online version contains supplementary material available at 10.1007/s10664-025-10622-4.

Abstract Image

查看原文本刊更多论文

对数据科学家的人工智能支持：关于工作流和替代代码建议的实证研究。

尽管人工智能助手在编码活动中很受欢迎，但关于这些编码助手是否能帮助用户完成数据科学任务的实证研究有限。此外，在数据科学编程中，探索替代路径已经被广泛提倡，因为这些路径可能导致不同的理解和结论(Gelman和Loken 2013；Kale et al. 2019)。现有的基于人工智能的编码助手是否能够支持数据科学家探索相关的替代路径仍未被探索。为了填补这一空白，我们进行了一项混合方法研究，以了解数据科学家如何在基于人工智能的编码助手的帮助下解决不同的数据科学任务，该助手在整个数据科学工作流程中提供明确的替代方案作为建议。具体来说，我们定量地调查了用户是否接受人工智能助手的代码建议，包括替代建议，以及这些建议在完成描述性和预测性数据科学任务时是否有帮助。通过实证研究，我们还调查了是否包括有关数据科学步骤（例如，数据探索）的信息，他们寻求建议在一个提示导致有用的建议。在我们的研究中，我们发现在提示中包含数据科学步骤在接受建议方面有统计学上显著的改善，而替代方案的存在并没有导致任何显着差异。我们的研究还显示，在描述性任务和预测性任务之间，推荐的接受度和有用性在统计上有显著差异。与会者普遍对人工智能协助和我们建议的界面持积极态度。我们将分享在研究过程中出现的交互以及用户在解决数据科学任务时遇到的挑战的进一步见解。补充信息：在线版本包含补充资料，可在10.1007/s10664-025-10622-4获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.