‘Who built this crap?’ Developing a Software Engineering Domain Specific Toxicity Detector

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3559508

Jaydeb Sarker

引用次数: 1

Abstract

Since toxicity during developers’ interactions in open source software (OSS) projects show negative impacts on developers’ relation, a toxicity detector for the Software Engineering (SE) domain is needed. However, prior studies found that contemporary toxicity detection tools performed poorly with the SE texts. To address this challenge, I have developed ToxiCR, a SE-specific toxicity detector that is evaluated with manually labeled 19,571 code review comments. I evaluate ToxiCR with different combinations of ten supervised learning models, five text vectorizers, and eight preprocessing techniques (two of them are SE domain-specific). After applying all possible combinations, I have found that ToxiCR significantly outperformed existing toxicity classifiers with accuracy of 95.8% and an F1 score of 88.9%.

查看原文本刊更多论文

“谁造的这个垃圾?”开发软件工程领域特定毒性检测器

由于开源软件(OSS)项目中开发人员交互过程中的毒性会对开发人员之间的关系产生负面影响，因此需要针对软件工程(SE)领域的毒性检测器。然而，先前的研究发现，当代毒性检测工具在SE文本中表现不佳。为了应对这一挑战，我开发了ToxiCR，这是一个针对se的毒性检测器，可以使用手动标记的19,571个代码审查注释进行评估。我用十种监督学习模型、五种文本矢量器和八种预处理技术(其中两种是特定于SE领域的)的不同组合来评估ToxiCR。在应用了所有可能的组合后，我发现毒物r显著优于现有的毒性分类器，准确率为95.8%，F1得分为88.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量