A Simple NLP-Based Approach to Support Onboarding and Retention in Open Source Communities

Christoph Stanik, Lloyd Montgomery, Daniel Martens, D. Fucci, W. Maalej
{"title":"A Simple NLP-Based Approach to Support Onboarding and Retention in Open Source Communities","authors":"Christoph Stanik, Lloyd Montgomery, Daniel Martens, D. Fucci, W. Maalej","doi":"10.1109/ICSME.2018.00027","DOIUrl":null,"url":null,"abstract":"Successful open source communities are constantly looking for new members and helping them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues.","PeriodicalId":6572,"journal":{"name":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"08 1","pages":"172-182"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2018.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

Successful open source communities are constantly looking for new members and helping them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues.
一个简单的基于nlp的方法来支持开源社区的入职和保留
成功的开源社区不断地寻找新成员,并帮助他们成为活跃的开发人员。在开源项目中,开发人员入职的一种常见方法是让新人专注于相关但易于解决的问题,以熟悉代码和社区。这项研究的目的是双重的。首先,我们的目标是通过简单地使用问题的标题和描述来分析解决问题的历史,从而自动识别新人可以解决的问题。其次,我们的目标是自动识别问题,这些问题可以由后来成为活跃开发人员的新人解决。我们挖掘了三个大型开源项目的问题跟踪器,并从已解决问题的标题和描述中提取了自然语言特征。在一系列实验中,我们对四种监督分类器的准确率进行了优化和比较,以实现我们的研究目标。对于第一个目标,随机森林实现了高达91%的精度(f1得分为72%),而对于第二个目标,决策树实现了92%的精度(f1得分为91%)。一个定性的评估给出了问题描述中哪些信息对新手有帮助的见解。我们的方法可以用于自动识别、标记问题,并根据问题的文本为开源软件项目中的新手推荐问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信