Kowndinya Boyalakuntla, M. Nagappan, S. Chimalakonda, Nuthan Munaiah
{"title":"RepoQuester: A Tool Towards Evaluating GitHub Projects","authors":"Kowndinya Boyalakuntla, M. Nagappan, S. Chimalakonda, Nuthan Munaiah","doi":"10.1109/ICSME55016.2022.00069","DOIUrl":null,"url":null,"abstract":"Given the drastic rise of repositories on GitHub, it is often hard for developers to find relevant projects meeting their requirements as analyzing source code and other artifacts is effort-intensive. In our prior work, we proposed Repo Reaper (or simply Reaper) that assesses GitHub projects based on seven metrics spanning across project collaboration, quality, and maintenance. Reaper identified 1.4 million projects out of nearly 1.8 million projects to have no purpose for collaboration or software development by classifying them into ‘engineered’ and ‘non-engineered’ software projects. While Reaper can be used to assess millions of repositories based on GHTorrent, it is not designed to be used by developers for standalone repositories on local machines and is dependent on GHTorrent. Hence, in this paper, we propose a re-engineered and extended command-line tool named RepoQuester that aims to assist developers in evaluating GitHub projects on their local machines. RepoQuester computes metrics for projects and does not classify projects into ‘engineered’ and ‘non-engineered’ ones. However, to demonstrate the correctness of metric scores produced by RepoQuester, we have performed the project classification on the Reaper’s training and validation datasets by updating them with the latest metric scores (as reported by RepoQuester). These datasets have their ground truth manually established. During the analysis, we observed that the machine learning classifiers built on the updated datasets produced an F1 score of 72%. During the evaluation, for each project, we found that RepoQuester can analyze metric scores in less than 10 seconds. A demo video explaining the tool highlights and usage is available at https://youtu.be/Q8OdmNzUfN0, and source code at https://github.com/Kowndinya2000/Repoquester.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Given the drastic rise of repositories on GitHub, it is often hard for developers to find relevant projects meeting their requirements as analyzing source code and other artifacts is effort-intensive. In our prior work, we proposed Repo Reaper (or simply Reaper) that assesses GitHub projects based on seven metrics spanning across project collaboration, quality, and maintenance. Reaper identified 1.4 million projects out of nearly 1.8 million projects to have no purpose for collaboration or software development by classifying them into ‘engineered’ and ‘non-engineered’ software projects. While Reaper can be used to assess millions of repositories based on GHTorrent, it is not designed to be used by developers for standalone repositories on local machines and is dependent on GHTorrent. Hence, in this paper, we propose a re-engineered and extended command-line tool named RepoQuester that aims to assist developers in evaluating GitHub projects on their local machines. RepoQuester computes metrics for projects and does not classify projects into ‘engineered’ and ‘non-engineered’ ones. However, to demonstrate the correctness of metric scores produced by RepoQuester, we have performed the project classification on the Reaper’s training and validation datasets by updating them with the latest metric scores (as reported by RepoQuester). These datasets have their ground truth manually established. During the analysis, we observed that the machine learning classifiers built on the updated datasets produced an F1 score of 72%. During the evaluation, for each project, we found that RepoQuester can analyze metric scores in less than 10 seconds. A demo video explaining the tool highlights and usage is available at https://youtu.be/Q8OdmNzUfN0, and source code at https://github.com/Kowndinya2000/Repoquester.