{"title":"在一个发布窗口内,源代码单元变更的频率是多少?","authors":"Joseph F. Shobe, Md Yasser Karim, Huzefa H. Kagdi","doi":"10.1145/2723742.2723759","DOIUrl":null,"url":null,"abstract":"To form a training set for a source-code change prediction model, e.g., using the association rule mining or machine learning techniques, commits from the source code history are needed. The traceability between releases and commits would facilitate a systematic choice of history in units of the project evolution scale (i.e., commits that constitute a software release). For example, the major release 25.0 in Chrome is mapped to the earliest revision 157687 and latest revision 165096 in the trunk. Using this traceability, an empirical study is reported on the frequency distribution of file changes for different release windows. In Chrome, the majority (50%) of the committed files change only once between a pair of consecutive releases. This trend is reversed after expanding the window size to at least 10. That is, the majority (50%) of the files change multiple times when commits constituting 10 or greater releases are considered. These results suggest that a training set of at least 10 releases is needed to provide a prediction coverage for majority of the files.","PeriodicalId":288030,"journal":{"name":"Proceedings of the 8th India Software Engineering Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Often does a Source Code Unit Change within a Release Window?\",\"authors\":\"Joseph F. Shobe, Md Yasser Karim, Huzefa H. Kagdi\",\"doi\":\"10.1145/2723742.2723759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To form a training set for a source-code change prediction model, e.g., using the association rule mining or machine learning techniques, commits from the source code history are needed. The traceability between releases and commits would facilitate a systematic choice of history in units of the project evolution scale (i.e., commits that constitute a software release). For example, the major release 25.0 in Chrome is mapped to the earliest revision 157687 and latest revision 165096 in the trunk. Using this traceability, an empirical study is reported on the frequency distribution of file changes for different release windows. In Chrome, the majority (50%) of the committed files change only once between a pair of consecutive releases. This trend is reversed after expanding the window size to at least 10. That is, the majority (50%) of the files change multiple times when commits constituting 10 or greater releases are considered. These results suggest that a training set of at least 10 releases is needed to provide a prediction coverage for majority of the files.\",\"PeriodicalId\":288030,\"journal\":{\"name\":\"Proceedings of the 8th India Software Engineering Conference\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th India Software Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723742.2723759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th India Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723742.2723759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How Often does a Source Code Unit Change within a Release Window?
To form a training set for a source-code change prediction model, e.g., using the association rule mining or machine learning techniques, commits from the source code history are needed. The traceability between releases and commits would facilitate a systematic choice of history in units of the project evolution scale (i.e., commits that constitute a software release). For example, the major release 25.0 in Chrome is mapped to the earliest revision 157687 and latest revision 165096 in the trunk. Using this traceability, an empirical study is reported on the frequency distribution of file changes for different release windows. In Chrome, the majority (50%) of the committed files change only once between a pair of consecutive releases. This trend is reversed after expanding the window size to at least 10. That is, the majority (50%) of the files change multiple times when commits constituting 10 or greater releases are considered. These results suggest that a training set of at least 10 releases is needed to provide a prediction coverage for majority of the files.