CodeCV: Mining Expertise of GitHub Users from Coding Activities

Daniel Atzberger, Nico Scordialo, Tim Cech, W. Scheibel, Matthias Trapp, J. Döllner
{"title":"CodeCV: Mining Expertise of GitHub Users from Coding Activities","authors":"Daniel Atzberger, Nico Scordialo, Tim Cech, W. Scheibel, Matthias Trapp, J. Döllner","doi":"10.1109/SCAM55253.2022.00021","DOIUrl":null,"url":null,"abstract":"The number of software projects developed collaboratively on social coding platforms is steadily increasing. One of the motivations for developers to participate in open-source software development is to make their development activities easier accessible to potential employers, e.g., in the form of a resume for their interests and skills. However, manual review of source code activities is time-consuming and requires detailed knowledge of the technologies used. Existing approaches are limited to a small subset of actual source code activity and metadata and do not provide explanations for their results. In this work, we present CodeCV, an approach to analyzing the commit activities of a GitHub user concerning the use of programming languages, software libraries, and higher-level concepts, e.g., Machine Learning or Cryptocurrency. Skills in using software libraries and programming languages are analyzed based on syntactic structures in the source code. Based on Labeled Latent Dirichlet Allocation, an automatically generated corpus of GitHub projects is used to learn the concept-specific vocabulary in identifier names and comments. This enables the capture of expertise on abstract concepts from a user's commit history. CodeCV further explains the results through links to the relevant commits in an interactive web dashboard. We tested our system on selected GitHub users who mainly contribute to popular projects to demonstrate that our approach is able to capture developers' expertise effectively.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM55253.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The number of software projects developed collaboratively on social coding platforms is steadily increasing. One of the motivations for developers to participate in open-source software development is to make their development activities easier accessible to potential employers, e.g., in the form of a resume for their interests and skills. However, manual review of source code activities is time-consuming and requires detailed knowledge of the technologies used. Existing approaches are limited to a small subset of actual source code activity and metadata and do not provide explanations for their results. In this work, we present CodeCV, an approach to analyzing the commit activities of a GitHub user concerning the use of programming languages, software libraries, and higher-level concepts, e.g., Machine Learning or Cryptocurrency. Skills in using software libraries and programming languages are analyzed based on syntactic structures in the source code. Based on Labeled Latent Dirichlet Allocation, an automatically generated corpus of GitHub projects is used to learn the concept-specific vocabulary in identifier names and comments. This enables the capture of expertise on abstract concepts from a user's commit history. CodeCV further explains the results through links to the relevant commits in an interactive web dashboard. We tested our system on selected GitHub users who mainly contribute to popular projects to demonstrate that our approach is able to capture developers' expertise effectively.
CodeCV:从编码活动中挖掘GitHub用户的专业知识
在社交编码平台上协作开发的软件项目数量正在稳步增长。开发人员参与开源软件开发的动机之一是使他们的开发活动更容易被潜在的雇主获取,例如,以他们的兴趣和技能的简历的形式。然而,手工审查源代码活动是非常耗时的,并且需要对所使用的技术有详细的了解。现有的方法仅限于实际源代码活动和元数据的一小部分,并且不提供对其结果的解释。在这项工作中,我们提出了CodeCV,这是一种分析GitHub用户关于编程语言、软件库和更高级别概念(例如机器学习或加密货币)使用的提交活动的方法。基于源代码中的语法结构,分析了使用软件库和编程语言的技能。基于Labeled Latent Dirichlet Allocation,使用自动生成的GitHub项目语料库来学习标识符名称和注释中的概念特定词汇。这样就可以从用户的提交历史中获取抽象概念的专业知识。CodeCV通过交互式web仪表板中的相关提交链接进一步解释了结果。我们在选定的GitHub用户上测试了我们的系统,这些用户主要为热门项目做出贡献,以证明我们的方法能够有效地捕获开发人员的专业知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信