André Luis Schwerz, Rafael Liberato, I. Wiese, Igor Steinmacher, M. Gerosa, J. E. Ferreira
{"title":"开发人员参与开源项目问题的预测","authors":"André Luis Schwerz, Rafael Liberato, I. Wiese, Igor Steinmacher, M. Gerosa, J. E. Ferreira","doi":"10.1109/SBSC.2012.27","DOIUrl":null,"url":null,"abstract":"Developers of distributed open source projects use management and issues tracking tool to communicate. These tools provide a large volume of unstructured information that makes the triage of issues difficult, increasing developers' overhead. This problem is common to online communities based on volunteer participation. This paper shows the importance of the content of comments in an open source project to build a classifier to predict the participation for a developer in an issue. To design this prediction model, we used two machine learning algorithms called Naive Bayes and J48. We used the data of three Apache Hadoop subprojects to evaluate the use of the algorithms. By applying our approach to the most active developers of these subprojects we have achieved an accuracy ranging from 79% to 96%. The results indicate that the content of comments in issues of open source projects is a relevant factor to build a classifier of issues for developers.","PeriodicalId":257965,"journal":{"name":"2012 Brazilian Symposium on Collaborative Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Prediction of Developer Participation in Issues of Open Source Projects\",\"authors\":\"André Luis Schwerz, Rafael Liberato, I. Wiese, Igor Steinmacher, M. Gerosa, J. E. Ferreira\",\"doi\":\"10.1109/SBSC.2012.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developers of distributed open source projects use management and issues tracking tool to communicate. These tools provide a large volume of unstructured information that makes the triage of issues difficult, increasing developers' overhead. This problem is common to online communities based on volunteer participation. This paper shows the importance of the content of comments in an open source project to build a classifier to predict the participation for a developer in an issue. To design this prediction model, we used two machine learning algorithms called Naive Bayes and J48. We used the data of three Apache Hadoop subprojects to evaluate the use of the algorithms. By applying our approach to the most active developers of these subprojects we have achieved an accuracy ranging from 79% to 96%. The results indicate that the content of comments in issues of open source projects is a relevant factor to build a classifier of issues for developers.\",\"PeriodicalId\":257965,\"journal\":{\"name\":\"2012 Brazilian Symposium on Collaborative Systems\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Brazilian Symposium on Collaborative Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBSC.2012.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Brazilian Symposium on Collaborative Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBSC.2012.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prediction of Developer Participation in Issues of Open Source Projects
Developers of distributed open source projects use management and issues tracking tool to communicate. These tools provide a large volume of unstructured information that makes the triage of issues difficult, increasing developers' overhead. This problem is common to online communities based on volunteer participation. This paper shows the importance of the content of comments in an open source project to build a classifier to predict the participation for a developer in an issue. To design this prediction model, we used two machine learning algorithms called Naive Bayes and J48. We used the data of three Apache Hadoop subprojects to evaluate the use of the algorithms. By applying our approach to the most active developers of these subprojects we have achieved an accuracy ranging from 79% to 96%. The results indicate that the content of comments in issues of open source projects is a relevant factor to build a classifier of issues for developers.