LLMs for science: Usage for code generation and data analysis

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2024-09-12 DOI:10.1002/smr.2723

Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, Ingo Weber

{"title":"LLMs for science: Usage for code generation and data analysis","authors":"Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, Ingo Weber","doi":"10.1002/smr.2723","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.2723","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2723","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.

查看原文本刊更多论文

用于科学的 LLM：用于代码生成和数据分析

大型语言模型（LLM）被认为可以提高当今许多工作领域的生产率。科学研究领域也不例外：基于 LLM 的工具在协助科学家日常工作方面的潜力已成为各学科讨论的热门话题。然而，我们对这一主题的研究才刚刚起步。目前还不清楚 LLM 的潜力将如何在研究实践中实现。通过本研究，我们首次提供了在研究过程中使用 LLM 的实证证据。我们调查了一系列基于 LLM 的工具在科学研究中的使用案例，并进行了首次研究，以评估当前工具在多大程度上有所帮助。在本立场文件中，我们特别报告了与软件工程相关的用例，尤其是生成应用代码以及开发数据分析和可视化脚本的用例。虽然我们研究的用例看似简单，但不同工具的结果却大相径庭。我们的研究结果凸显了基于 LLM 的工具的前景，但我们也发现了各种问题，尤其是这些工具所提供的输出的完整性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109