{"title":"GEMiner:挖掘社交和编程行为以识别Github中的专家","authors":"Wenkai Mo, Beijun Shen, Yuming He, Hao Zhong","doi":"10.1145/2875913.2875924","DOIUrl":null,"url":null,"abstract":"Hosting over 10 million repositories, GitHub becomes the largest open source community in the world. Besides sharing code, Github is also a social network, in which developers can follow others or keep track of their interested projects. Considering the multi-roles of Github, integrating heterogenous data of each developer to identify experts is a challenging task. In this paper, we propose GEMiner, a novel approach to identify experts for some specific programming languages in Github. Different from previous approaches, GEMiner analyzes the social behaviors and programming behaviors of a developer to determine the expertise of the developer. When modeling social behaviors of developers, to integrate heterogenous social networks in Github, GEMiner implements a Multi-Sources PageRank algorithm. Also, GEMiner analyzes the behaviors of developers when they are programming (e.g., their commit activities and their preferred programming languages) to model programming behaviors of them. Based on our expertise models and our extracted programming languages data, GEMiner can then identify experts for some specific programming languages in Github. We conducted experiments on a real data set, and our results show that GEMiner identifies experts with 60% accuracy higher than the state-of-the-art algorithms.","PeriodicalId":361135,"journal":{"name":"Proceedings of the 7th Asia-Pacific Symposium on Internetware","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"GEMiner: Mining Social and Programming Behaviors to Identify Experts in Github\",\"authors\":\"Wenkai Mo, Beijun Shen, Yuming He, Hao Zhong\",\"doi\":\"10.1145/2875913.2875924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hosting over 10 million repositories, GitHub becomes the largest open source community in the world. Besides sharing code, Github is also a social network, in which developers can follow others or keep track of their interested projects. Considering the multi-roles of Github, integrating heterogenous data of each developer to identify experts is a challenging task. In this paper, we propose GEMiner, a novel approach to identify experts for some specific programming languages in Github. Different from previous approaches, GEMiner analyzes the social behaviors and programming behaviors of a developer to determine the expertise of the developer. When modeling social behaviors of developers, to integrate heterogenous social networks in Github, GEMiner implements a Multi-Sources PageRank algorithm. Also, GEMiner analyzes the behaviors of developers when they are programming (e.g., their commit activities and their preferred programming languages) to model programming behaviors of them. Based on our expertise models and our extracted programming languages data, GEMiner can then identify experts for some specific programming languages in Github. We conducted experiments on a real data set, and our results show that GEMiner identifies experts with 60% accuracy higher than the state-of-the-art algorithms.\",\"PeriodicalId\":361135,\"journal\":{\"name\":\"Proceedings of the 7th Asia-Pacific Symposium on Internetware\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th Asia-Pacific Symposium on Internetware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2875913.2875924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2875913.2875924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GEMiner: Mining Social and Programming Behaviors to Identify Experts in Github
Hosting over 10 million repositories, GitHub becomes the largest open source community in the world. Besides sharing code, Github is also a social network, in which developers can follow others or keep track of their interested projects. Considering the multi-roles of Github, integrating heterogenous data of each developer to identify experts is a challenging task. In this paper, we propose GEMiner, a novel approach to identify experts for some specific programming languages in Github. Different from previous approaches, GEMiner analyzes the social behaviors and programming behaviors of a developer to determine the expertise of the developer. When modeling social behaviors of developers, to integrate heterogenous social networks in Github, GEMiner implements a Multi-Sources PageRank algorithm. Also, GEMiner analyzes the behaviors of developers when they are programming (e.g., their commit activities and their preferred programming languages) to model programming behaviors of them. Based on our expertise models and our extracted programming languages data, GEMiner can then identify experts for some specific programming languages in Github. We conducted experiments on a real data set, and our results show that GEMiner identifies experts with 60% accuracy higher than the state-of-the-art algorithms.