Identifying Sexism and Misogyny in Pull Request Comments

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3559515

Sayma Sultana

引用次数: 0

Abstract

Being extremely dominated by men, software development organizations lack diversity. People from other groups often encounter sexist, misogynistic, and discriminatory (SMD) speech during communication. To identify SMD contents, I aim to build an automatic misogyny identification (AMI) tool for the domain of software developers. On this goal, I built a dataset of 10,138 pull request comments mined from Github based on a keyword-based selection, followed by manual validation. Using ten-fold cross-validation, I evaluated ten machine learning algorithms for automatic identification. The best performing model achieved 80% precision, 67.07% recall, 72.5% f-score, and 95.96% accuracy.

查看原文本刊更多论文

识别拉请求评论中的性别歧视和厌女症

由于由男性主导，软件开发组织缺乏多样性。来自其他群体的人经常在交流中遇到性别歧视、厌女和歧视性(SMD)言论。为了识别SMD内容，我的目标是为软件开发人员领域构建一个自动厌女识别(AMI)工具。为了实现这个目标，我基于关键字选择从Github中挖掘了10,138条拉请求评论，然后进行了手动验证。使用十倍交叉验证，我评估了十种用于自动识别的机器学习算法。最佳模型的准确率为80%，召回率为67.07%，f值为72.5%，准确率为95.96%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量