Pair programming conversations with agents vs. developers: challenges and opportunities for SE community

软件产业与工程 Pub Date : 2022-11-07 DOI:10.1145/3540250.3549127

Peter Robe, S. Kuttal, J. AuBuchon, Jacob C. Hart

{"title":"Pair programming conversations with agents vs. developers: challenges and opportunities for SE community","authors":"Peter Robe, S. Kuttal, J. AuBuchon, Jacob C. Hart","doi":"10.1145/3540250.3549127","DOIUrl":null,"url":null,"abstract":"Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.

查看原文本刊更多论文

与代理和开发人员的结对编程对话:SE社区的挑战和机遇

最近的研究表明了交互式结对编程会话代理的可行性，但是实现这样的代理面临三个挑战:缺乏基准数据集，缺乏软件工程特定标签，需要理解开发人员的对话。为了解决这些挑战，我们进行了一项绿野仙踪研究，有14名参与者使用模拟代理进行结对编程，并收集了4443个开发人员-代理的话语。基于此数据集，我们使用开放编码过程创建了26个软件工程标签，以开发分层分类方案。为了理解标记的开发人员-代理对话，我们比较了三种最先进的基于转换的语言模型(BERT、GPT-2和XLNet)的准确性，它们可以互换执行。为了开始创建开发者-代理数据集，研究人员和实践者需要进行资源密集型的绿野仙踪研究。目前，在视频托管网站上存在着大量的开发者与开发者之间的对话。为了研究使用开发人员-开发人员对话的可行性，我们用我们的分层分类方案标记了一个公开可用的开发人员-开发人员数据集(3,436个话语)，并发现在开发人员-开发人员数据上训练的BERT模型的表现比在开发人员-代理数据上训练的BERT差约10%，但是当使用迁移学习时，准确性得到了提高。最后，我们的定性分析表明，开发人员与开发人员之间的对话比开发人员与代理之间的对话更加含蓄、中立和固执己见。我们的研究结果对开发会话代理的软件工程研究人员和实践者具有启示意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

软件产业与工程

自引率

0.00%

发文量

676