{"title":"基于人工智能的 GitHub 仓库类似问题聚类","authors":"Hamzeh Eyal Salman","doi":"10.1016/j.cola.2023.101257","DOIUrl":null,"url":null,"abstract":"<div><p>Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101257"},"PeriodicalIF":1.7000,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI-based clustering of similar issues in GitHub’s repositories\",\"authors\":\"Hamzeh Eyal Salman\",\"doi\":\"10.1016/j.cola.2023.101257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.</p></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"78 \",\"pages\":\"Article 101257\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118423000679\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118423000679","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
AI-based clustering of similar issues in GitHub’s repositories
Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.