Analyzing and Detecting Information Types of Developer Live Chat Threads

IF 6.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Xiuwei Shang, Shuai Zhang, Yitong Zhang, Shikai Guo, Yulong Li, Rong Chen, Hui Li, Xiaochen Li, He Jiang
{"title":"Analyzing and Detecting Information Types of Developer Live Chat Threads","authors":"Xiuwei Shang, Shuai Zhang, Yitong Zhang, Shikai Guo, Yulong Li, Rong Chen, Hui Li, Xiaochen Li, He Jiang","doi":"10.1145/3643677","DOIUrl":null,"url":null,"abstract":"<p>Online chatrooms serve as vital platforms for information exchange among software developers. With multiple developers engaged in rapid communication and diverse conversation topics, the resulting chat messages often manifest complexity and lack structure. To enhance the efficiency of extracting information from chat <i>threads</i>, automatic mining techniques are introduced for thread classification. However, previous approaches still grapple with unsatisfactory classification accuracy, due to two primary challenges that they struggle to adequately capture long-distance dependencies within chat threads and address the issue of category imbalance in labeled datasets. To surmount these challenges, we present a topic classification approach for chat information types named EAEChat. Specifically, EAEChat comprises three core components: the text feature encoding component captures contextual text features using a multi-head self-attention mechanism-based text feature encoder, and a siamese network is employed to mitigate overfitting caused by limited data; the data augmentation component expands a small number of categories in the training dataset using a technique tailored to developer chat messages, effectively tackling the challenge of imbalanced category distribution; the non-text feature encoding component employs a feature fusion model to integrate deep text features with manually extracted non-text features. Evaluation across three real-world projects demonstrates that EAEChat respectively achieves an average precision, recall, and F1-score of 0.653, 0.651, and 0.644, and it marks a significant 7.60% improvement over the state-of-the-art approachs. These findings confirm the effectiveness of our method in proficiently classifying developer chat messages in online chatrooms.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"27 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643677","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Online chatrooms serve as vital platforms for information exchange among software developers. With multiple developers engaged in rapid communication and diverse conversation topics, the resulting chat messages often manifest complexity and lack structure. To enhance the efficiency of extracting information from chat threads, automatic mining techniques are introduced for thread classification. However, previous approaches still grapple with unsatisfactory classification accuracy, due to two primary challenges that they struggle to adequately capture long-distance dependencies within chat threads and address the issue of category imbalance in labeled datasets. To surmount these challenges, we present a topic classification approach for chat information types named EAEChat. Specifically, EAEChat comprises three core components: the text feature encoding component captures contextual text features using a multi-head self-attention mechanism-based text feature encoder, and a siamese network is employed to mitigate overfitting caused by limited data; the data augmentation component expands a small number of categories in the training dataset using a technique tailored to developer chat messages, effectively tackling the challenge of imbalanced category distribution; the non-text feature encoding component employs a feature fusion model to integrate deep text features with manually extracted non-text features. Evaluation across three real-world projects demonstrates that EAEChat respectively achieves an average precision, recall, and F1-score of 0.653, 0.651, and 0.644, and it marks a significant 7.60% improvement over the state-of-the-art approachs. These findings confirm the effectiveness of our method in proficiently classifying developer chat messages in online chatrooms.

分析和检测开发人员即时聊天主题的信息类型
在线聊天室是软件开发人员进行信息交流的重要平台。由于多个开发人员进行快速交流,聊天主题也多种多样,因此产生的聊天信息往往表现出复杂性和缺乏结构性。为了提高从聊天线程中提取信息的效率,人们引入了线程分类自动挖掘技术。然而,以往的方法仍然无法达到令人满意的分类精度,这主要是由于它们难以充分捕捉聊天线程中的长距离依赖关系,以及无法解决标签数据集中类别不平衡的问题。为了克服这些挑战,我们提出了一种名为 EAEChat 的聊天信息类型主题分类方法。具体来说,EAEChat 由三个核心组件组成:文本特征编码组件使用基于多头自注意机制的文本特征编码器捕获上下文文本特征,并使用连体网络来减轻有限数据造成的过拟合;数据增强组件使用一种为开发者聊天信息量身定制的技术扩展训练数据集中的少量类别,从而有效解决类别分布不平衡的难题;非文本特征编码组件使用一种特征融合模型来整合深度文本特征和手动提取的非文本特征。对三个真实世界项目的评估表明,EAEChat 的平均精确度、召回率和 F1 分数分别达到了 0.653、0.651 和 0.644,比最先进的方法显著提高了 7.60%。这些发现证实了我们的方法在熟练分类在线聊天室中的开发人员聊天信息方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology 工程技术-计算机:软件工程
CiteScore
6.30
自引率
4.50%
发文量
164
审稿时长
>12 weeks
期刊介绍: Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信