{"title":"Which Change Sets in Git Repositories Are Related?","authors":"Jasmin Ramadani, S. Wagner","doi":"10.1109/QRS.2016.52","DOIUrl":null,"url":null,"abstract":"Software repositories contain valuable information about the history of software changes. Using data mining, researchers have identified file changes that happened together frequently to present hints for necessary changes to developers. However, not all file change sets are related. This can affect the recommendations about coupled file changes negatively by delivering irrelevant couplings to the developers. The commit time and branching characteristics of Git have not been investigated together in previous heuristics for grouping related change sets. We exploit the mappings between commit messages and issue ids for judging the relatedness of change sets. We propose a heuristic for Git and investigate the influence of two factors, the time between the commits and their branching on the relatedness of change sets using the repositories of five open-source systems using logistic regression. According to our findings, the combination of these two factors influences the relatedness of change sets. Individually measured, only the time significantly influences the relatedness, the branching itself does not. Our results support previous heuristic that also in Git repositories the commit time is important for grouping related change sets.","PeriodicalId":412973,"journal":{"name":"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS.2016.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Software repositories contain valuable information about the history of software changes. Using data mining, researchers have identified file changes that happened together frequently to present hints for necessary changes to developers. However, not all file change sets are related. This can affect the recommendations about coupled file changes negatively by delivering irrelevant couplings to the developers. The commit time and branching characteristics of Git have not been investigated together in previous heuristics for grouping related change sets. We exploit the mappings between commit messages and issue ids for judging the relatedness of change sets. We propose a heuristic for Git and investigate the influence of two factors, the time between the commits and their branching on the relatedness of change sets using the repositories of five open-source systems using logistic regression. According to our findings, the combination of these two factors influences the relatedness of change sets. Individually measured, only the time significantly influences the relatedness, the branching itself does not. Our results support previous heuristic that also in Git repositories the commit time is important for grouping related change sets.