告诉我你在和谁说话,我会告诉你哪些问题需要你的技能

Fábio Santos, Jacob Penney, J. F. Pimentel, I. Wiese, Igor Steinmacher, M. Gerosa
{"title":"告诉我你在和谁说话,我会告诉你哪些问题需要你的技能","authors":"Fábio Santos, Jacob Penney, J. F. Pimentel, I. Wiese, Igor Steinmacher, M. Gerosa","doi":"10.1109/MSR59073.2023.00087","DOIUrl":null,"url":null,"abstract":"Selecting an appropriate task is challenging for newcomers to Open Source Software (OSS) projects. To facilitate task selection, researchers and OSS projects have leveraged machine learning techniques, historical information, and textual analysis to label tasks (a.k.a. issues) with information such as the issue type and domain. These approaches are still far from mainstream adoption, possibly because of a lack of good predictors. Inspired by previous research, we advocate that label prediction might benefit from leveraging metrics derived from communication data and social network analysis (SNA) for issues in which social interaction occurs. Thus, we study how these \"social metrics\" can improve the automatic labeling of open issues with API domains—categories of APIs used in the source code that solves the issue—which the literature shows that newcomers to the project consider relevant for task selection. We mined data from OSS projects’ repositories and organized it in periods to reflect the seasonality of the contributors’ project participation. We replicated metrics from previous work and added social metrics to the corpus to predict API-domain labels. Social metrics improved the performance of the classifiers compared to using only the issue description text in terms of precision, recall, and F-measure. Precision (0.922) increased by 15.82% and F-measure (0.942) by 15.89% for a project with high social activity. These results indicate that social metrics can help capture the patterns of social interactions in a software project and improve the labeling of issues in an issue tracker.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Tell Me Who Are You Talking to and I Will Tell You What Issues Need Your Skills\",\"authors\":\"Fábio Santos, Jacob Penney, J. F. Pimentel, I. Wiese, Igor Steinmacher, M. Gerosa\",\"doi\":\"10.1109/MSR59073.2023.00087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Selecting an appropriate task is challenging for newcomers to Open Source Software (OSS) projects. To facilitate task selection, researchers and OSS projects have leveraged machine learning techniques, historical information, and textual analysis to label tasks (a.k.a. issues) with information such as the issue type and domain. These approaches are still far from mainstream adoption, possibly because of a lack of good predictors. Inspired by previous research, we advocate that label prediction might benefit from leveraging metrics derived from communication data and social network analysis (SNA) for issues in which social interaction occurs. Thus, we study how these \\\"social metrics\\\" can improve the automatic labeling of open issues with API domains—categories of APIs used in the source code that solves the issue—which the literature shows that newcomers to the project consider relevant for task selection. We mined data from OSS projects’ repositories and organized it in periods to reflect the seasonality of the contributors’ project participation. We replicated metrics from previous work and added social metrics to the corpus to predict API-domain labels. Social metrics improved the performance of the classifiers compared to using only the issue description text in terms of precision, recall, and F-measure. Precision (0.922) increased by 15.82% and F-measure (0.942) by 15.89% for a project with high social activity. These results indicate that social metrics can help capture the patterns of social interactions in a software project and improve the labeling of issues in an issue tracker.\",\"PeriodicalId\":317960,\"journal\":{\"name\":\"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR59073.2023.00087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR59073.2023.00087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对于开源软件(OSS)项目的新手来说,选择合适的任务是一个挑战。为了方便任务选择,研究人员和OSS项目利用机器学习技术、历史信息和文本分析,用问题类型和领域等信息标记任务(也称为问题)。这些方法仍远未被主流采用,可能是因为缺乏良好的预测因素。受先前研究的启发,我们主张标签预测可能受益于利用来自通信数据和社会网络分析(SNA)的指标,以解决发生社会互动的问题。因此,我们研究这些“社会指标”如何改进API域(解决问题的源代码中使用的API类别)对开放问题的自动标记,文献表明项目的新手认为这与任务选择相关。我们从OSS项目的存储库中挖掘数据,并将其按周期组织起来,以反映贡献者参与项目的季节性。我们复制了之前工作中的指标,并将社交指标添加到语料库中,以预测api领域标签。与仅使用问题描述文本相比,社会度量在精度、召回率和f度量方面提高了分类器的性能。社会活跃度高的项目,精度(0.922)提高15.82%,F-measure(0.942)提高15.89%。这些结果表明,社会指标可以帮助捕获软件项目中的社会交互模式,并改进问题跟踪器中问题的标记。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Tell Me Who Are You Talking to and I Will Tell You What Issues Need Your Skills
Selecting an appropriate task is challenging for newcomers to Open Source Software (OSS) projects. To facilitate task selection, researchers and OSS projects have leveraged machine learning techniques, historical information, and textual analysis to label tasks (a.k.a. issues) with information such as the issue type and domain. These approaches are still far from mainstream adoption, possibly because of a lack of good predictors. Inspired by previous research, we advocate that label prediction might benefit from leveraging metrics derived from communication data and social network analysis (SNA) for issues in which social interaction occurs. Thus, we study how these "social metrics" can improve the automatic labeling of open issues with API domains—categories of APIs used in the source code that solves the issue—which the literature shows that newcomers to the project consider relevant for task selection. We mined data from OSS projects’ repositories and organized it in periods to reflect the seasonality of the contributors’ project participation. We replicated metrics from previous work and added social metrics to the corpus to predict API-domain labels. Social metrics improved the performance of the classifiers compared to using only the issue description text in terms of precision, recall, and F-measure. Precision (0.922) increased by 15.82% and F-measure (0.942) by 15.89% for a project with high social activity. These results indicate that social metrics can help capture the patterns of social interactions in a software project and improve the labeling of issues in an issue tracker.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信