{"title":"Explore the evolution of development topics via on-line LDA","authors":"Jiajun Hu, Xiaobing Sun, Bin Li","doi":"10.1109/SANER.2015.7081876","DOIUrl":null,"url":null,"abstract":"Software repositories such as revision control systems and bug tracking systems are usually used to manage the changes of software projects. During software maintenance and evolution, software developers and stakeholders need to investigate these repositories to identify what tasks were worked on in a particular time interval and how much effort was devoted to them. A typical way of mining software repositories is to use topic analysis models, e.g., Latent Dirichlet Allocation (LDA), to identify and organize the underlying structure in software documents to understand the evolution of development topics. These previously LDA-based topic analysis models can capture either changes on the strength (popularity) of various development topics over time (i.e., strength evolution) or changes in the content (the words that form the topic) of existing topics over time (i.e., content evolution). Unfortunately, few techniques can capture both strength and content evolution simultaneously. However, both pieces of information are necessary for developers to fully understand how software evolves. In this paper, we propose a novel approach to analyze commit messages within a project's lifetime to capture both strength and content evolution simultaneously via Online Latent Dirichlet Allocation (On-Line LDA). Moreover, the proposed approach also provides an efficient way to detect emerging topics in real development iteration when a new feature request arrives at a particular time, thus helping project stakeholds progress their projects smoothly.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2015.7081876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Software repositories such as revision control systems and bug tracking systems are usually used to manage the changes of software projects. During software maintenance and evolution, software developers and stakeholders need to investigate these repositories to identify what tasks were worked on in a particular time interval and how much effort was devoted to them. A typical way of mining software repositories is to use topic analysis models, e.g., Latent Dirichlet Allocation (LDA), to identify and organize the underlying structure in software documents to understand the evolution of development topics. These previously LDA-based topic analysis models can capture either changes on the strength (popularity) of various development topics over time (i.e., strength evolution) or changes in the content (the words that form the topic) of existing topics over time (i.e., content evolution). Unfortunately, few techniques can capture both strength and content evolution simultaneously. However, both pieces of information are necessary for developers to fully understand how software evolves. In this paper, we propose a novel approach to analyze commit messages within a project's lifetime to capture both strength and content evolution simultaneously via Online Latent Dirichlet Allocation (On-Line LDA). Moreover, the proposed approach also provides an efficient way to detect emerging topics in real development iteration when a new feature request arrives at a particular time, thus helping project stakeholds progress their projects smoothly.