{"title":"Improving the detection accuracy of evolutionary coupling","authors":"Manishankar Mondal, C. Roy, Kevin A. Schneider","doi":"10.1109/ICPC.2013.6613853","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613853","url":null,"abstract":"If two or more program entities (e.g., files, classes, methods) co-change frequently during software evolution, these entities are said to have evolutionary coupling. The entities that frequently co-change (i.e., exhibit evolutionary coupling) are likely to have logical coupling (or dependencies) among them. Association rules and two related measurements, Support and Confidence, have been used to predict whether two or more co-changing entities are logically coupled. In this paper, we propose and investigate a new measurement, Significance, that has the potential to improve the detection accuracy of association rule mining techniques. Our preliminary investigation on four open-source subject systems implies that our proposed measurement is capable of extracting coupling relationships even from infrequently co-changed entity sets that might seem insignificant while considering only Support and Confidence. Our proposed measurement, Significance (in association with Support and Confidence), has the potential to predict logical coupling with higher precision and recall.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126284716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samir Gupta, Sana Malik, L. Pollock, K. Vijay-Shanker, Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, Jeffrey C. Carver, Laura Moreno, Jairo Aponte, G. Sridhara, Andrian Marcus, Zohreh Sharafi, A. Marchetto, A. Susi, G. Antoniol, Koki Kato, Stephan Diehl, Nigar Gurbanova
{"title":"Part-of-speech tagging of program identifiers for improved text-based software engineering tools","authors":"Samir Gupta, Sana Malik, L. Pollock, K. Vijay-Shanker, Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, Jeffrey C. Carver, Laura Moreno, Jairo Aponte, G. Sridhara, Andrian Marcus, Zohreh Sharafi, A. Marchetto, A. Susi, G. Antoniol, Koki Kato, Stephan Diehl, Nigar Gurbanova","doi":"10.1109/ICPC.2013.6613828","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613828","url":null,"abstract":"To aid program comprehension, programmers choose identifiers for methods, classes, fields and other program elements primarily by following naming conventions in software. These software “naming conventions” follow systematic patterns which can convey deep natural language clues that can be leveraged by software engineering tools. For example, they can be used to increase the accuracy of software search tools, improve the ability of program navigation tools to recommend related methods, and raise the accuracy of other program analyses. After splitting multi-word names into their component words, the next step to extracting accurate natural language information is tagging each word with its part of speech (POS) and then chunking the name into natural language phrases. State-of-theart approaches, most of which rely on “traditional POS taggers” trained on natural language documents, do not capture the syntactic structure of program elements. In this paper, we present a POS tagger and syntactic chunker for source code names that takes into account programmers' naming conventions to understand the regular, systematic ways a program element is named. We studied the naming conventions used in Object Oriented Programming and identified different grammatical constructions that characterize a large number of program identifiers. This study then informed the design of our POS tagger and chunker. Our evaluation results show a significant improvement in accuracy(11%-20%) of POS tagging of identifiers, over the current approaches. With this improved accuracy, both automated software engineering tools and developers will be able to better capture and understand the information available in code.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130153509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Moreno, Andrian Marcus, L. Pollock, K. Vijay-Shanker
{"title":"JSummarizer: An automatic generator of natural language summaries for Java classes","authors":"Laura Moreno, Andrian Marcus, L. Pollock, K. Vijay-Shanker","doi":"10.1109/ICPC.2013.6613855","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613855","url":null,"abstract":"JSummarizer is an Eclipse plug-in for automatically generating natural language summaries of Java classes. The summary is based on the stereotype of the class, which implicitly encodes the design intent of the class and is automatically inferred by JSummarizer. The tool uses a set of predefined heuristics to determine what information will be reflected in the summary, and it uses natural language processing and generation techniques to form the summary. The generated summaries can be used to re-document the code and to help developers to easier understand large and complex classes.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128599131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating software clustering algorithms in the context of program comprehension","authors":"Anas Mahmoud, Nan Niu","doi":"10.1109/ICPC.2013.6613844","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613844","url":null,"abstract":"We propose a novel approach for evaluating software clustering algorithms in the context of program comprehension. Based on the assumption that program comprehension is a task-driven activity, our approach utilizes interaction logs from previous maintenance sessions to automatically devise multiple comprehension-aware and task-sensitive decompositions of software systems. These decompositions are then used as authoritative figures to evaluate the effectiveness of various clustering algorithms. Our approach addresses several challenges associated with evaluating clustering algorithms externally using expert-driven authoritative decompositions. Such limitations include the subjectivity of human experts, the availability of such authoritative figures, and the decaying structure of software systems. We conduct an experimental analysis using two datasets, including an open-source system and a proprietary system, to test the applicability of our approach and validate our research claims.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123182959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a unified software attack model to assess software protections","authors":"C. Basile, M. Ceccato","doi":"10.1109/ICPC.2013.6613852","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613852","url":null,"abstract":"Attackers can tamper with programs to break usage conditions. Different software protection techniques have been proposed to limit the possibility of tampering. Some of them just limit the possibility to understand the (binary) code, others react more actively when a change attempt is detected. However, the validation of the software protection techniques has been always conducted without taking into consideration a unified process adopted by attackers to tamper with programs. In this paper we present an extension of the mini-cycle of change, initially proposed to model the process of changing program for maintenance, to describe the process faced by an attacker to defeat software protections. This paper also shows how this new model should support a developer when considering what are the most appropriate protections to deploy.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117259592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards generating human-oriented summaries of unit test cases","authors":"Manabu Kamimura, G. Murphy","doi":"10.1109/ICPC.2013.6613851","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613851","url":null,"abstract":"The emergence of usable unit testing frameworks (e.g., JUnit for Java code) and unit test generators (e.g., CodePro for Java code) make it easier to create more comprehensive unit testing suites for applications. Unfortunately, test code, especially generated test code, can be difficult to comprehend. In this paper, we propose generating human-oriented summaries of test cases. We suggest an initial approach based on a static analysis of the source code of the test cases. Our goal is to help improve a human's ability to quickly comprehend unit test cases so that appropriate decisions can be made about where to place effort when dealing with large unit test suites.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123653109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patterns of cross-language linking in java frameworks","authors":"Philip Mayer, Andreas Schroeder","doi":"10.1109/ICPC.2013.6613839","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613839","url":null,"abstract":"The term Cross-Language Linking refers to the ability to specify, locate, navigate, and keep intact the connections between artifacts defined in different programming languages used for building one software application. Although understanding cross-language links and keeping them intact during development and maintenance activities is an important productivity issue, there has been little research on understanding the characteristics of such connections. We have thus built a theory from case studies, specifically, three theory-selected Java cross-language frameworks, each of which links artifacts written in the Java programming language to artifacts written in a declarative, framework-specific domain specific language. Our main contribution is to identify, from these experiences, common patterns of cross-language linking in the domain of Java frameworks with DSLs, which besides their informative nature can also be seen as requirements for designing and building a linking language and tooling infrastructure.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133320069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diana Diaz, G. Bavota, Andrian Marcus, R. Oliveto, Silvia Takahashi, A. D. Lucia
{"title":"Using code ownership to improve IR-based Traceability Link Recovery","authors":"Diana Diaz, G. Bavota, Andrian Marcus, R. Oliveto, Silvia Takahashi, A. D. Lucia","doi":"10.1109/ICPC.2013.6613840","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613840","url":null,"abstract":"Information Retrieval (IR) techniques have gained wide-spread acceptance as a method for automating traceability recovery. These techniques recover links between software artifacts based on their textual similarity, i.e., the higher the similarity, the higher the likelihood that there is a link between the two artifacts. A common problem with all IR-based techniques is filtering out noise from the list of candidate links, in order to improve the recovery accuracy. Indeed, software artifacts may be related in many ways and the textual information captures only one aspect of their relationships. In this paper we propose to leverage code ownership information to capture relationships between source code artifacts for improving the recovery of traceability links between documentation and source code. Specifically, we extract the author of each source code component and for each author we identify the “context” she worked on. Thus, for a given query from the external documentation we compute the similarity between it and the context of the authors. When retrieving classes that relate to a specific query using a standard IR-based approach we reward all the classes developed by the authors having their context most similar to the query, by boosting their similarity to the query. The proposed approach, named TYRION (TraceabilitY link Recovery using Information retrieval and code OwNership), has been instantiated for the recovery of traceability links between use cases and Java classes of two software systems. The results indicate that code ownership information can be used to improve the accuracy of an IR-based traceability link recovery technique.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131606478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SimCad: An extensible and faster clone detection tool for large scale software systems","authors":"M. Uddin, C. Roy, Kevin A. Schneider","doi":"10.1109/ICPC.2013.6613857","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613857","url":null,"abstract":"Code cloning is an inevitable phenomenon in evolution of software systems. To reduce the harmful effects of clones in software evolution, they need to be identified correctly as well in a time efficient way. There might be various types of clones in a software system. Earlier research shows detection of near-miss clones in large datasets appears to be costly in terms of time and memory. Among the clone detection tools available in practice, not very many of them are found effective in that regard. In this paper we present a standalone clone detection tool SimCad. It is based on a highly scalable and faster clone detection algorithm designed to detect both exact and near-miss clones in large-scale software systems. One of the potential aspects of SimCad is that its clone detection function is made more portable by packaging it into a library called SimLib. Thus, SimLib now can be used as an off-the-shelf clone detection library that can be easily integrated into other applications that are designed to work based on detected clones. For example, a standalone tool or an Integrated Development Environment (IDE) plugin can use SimLib for realtime clone detection while providing its own services like clone visualization and/or clone management functionalities. We hope that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspects of detection and management of clones in software.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134468983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural information based term weighting in text retrieval for feature location","authors":"Blake Bassett, Nicholas A. Kraft","doi":"10.1109/ICPC.2013.6613841","DOIUrl":"https://doi.org/10.1109/ICPC.2013.6613841","url":null,"abstract":"Many recent feature location techniques (FLTs) apply text retrieval (TR) techniques to corpora built from text embedded in source code. Term weighting is a standard preprocessing step in TR and is used to adjust the importance of a term within a document or corpus. Common term weighting schemes such as tf-idf may not be optimal for use with source code, because they originate from a natural language context and were designed for use with unstructured documents. In this paper we propose a new approach to term weighting in which term weights are assigned using the structural information from the source code. We then evaluate the proposed approach by conducting an empirical study of a TR-based FLT. In all, we study over 400 bugs and features from five open source Java systems and find that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.","PeriodicalId":237170,"journal":{"name":"2013 21st International Conference on Program Comprehension (ICPC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125842277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}