Automatically Identifying the Quality of Developer Chats for Post Hoc Use

Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock
{"title":"Automatically Identifying the Quality of Developer Chats for Post Hoc Use","authors":"Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock","doi":"10.1145/3450503","DOIUrl":null,"url":null,"abstract":"Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"58 1","pages":"1 - 28"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3450503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.
自动识别开发人员聊天的质量用于事后使用
软件工程师在问答论坛(如Stack Overflow)和最近的公共聊天社区(如Slack、IRC和Gitter)上众包他们日常挑战的答案。许多与软件相关的聊天对话包含有价值的专家知识,这些知识对于挖掘改进编程支持工具和没有参与原始聊天对话的读者都很有用。然而,大多数聊天平台和社区不包含内置的质量指标(例如,接受的答案,投票计数)。因此,很难识别包含用于挖掘或阅读的有用信息的对话,即具有事后质量的对话。在本文中,我们研究了自动检测公共聊天频道中即时质量的开发人员对话。我们首先描述了对400个开发人员对话的分析,这些对话表明了事后质量的潜在特征,然后是基于机器学习的方法,用于自动识别事后质量的对话。我们对四个编程社区(python、clojure、elm和racket)中的2000个带注释的Slack会话进行了评估,结果表明,我们的方法可以达到0.82的精度、0.90的召回率、0.86的F-measure和0.57的MCC。据我们所知,这是第一个检测开发人员事后质量对话的自动化技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信