{"title":"Towards Better Symbol Resolution for C/C++ Programs: A Cluster-Based Solution","authors":"Richárd Szalay, Z. Porkoláb, Dániel Krupp","doi":"10.1109/SCAM.2017.15","DOIUrl":"https://doi.org/10.1109/SCAM.2017.15","url":null,"abstract":"Resolving symbol references is an important part of many application areas from development environments to various static analyser tools, especially when it is used for code comprehension purposes. Different occurrences of the same program elements, like function definitions and their call sites, variable declarations and their usage, or type definitions and their applications should be connected. In case of the C++ programming language, the most current tools use mangled names to correlate symbols, e.g. when implementing actions like \"go to definition\" or \"list all references\". However, for large projects, where multiple binaries are created, symbol resolution based on mangled names can be, and usually is, ambiguous. This leads to inaccurate behaviour even in major development tools. In this paper we explore the reason of this ambiguity, and propose our clustering algorithm based on essential build information to improve the accuracy of symbol resolution. We implemented our method as part of the CodeCompass open source code comprehension tool and measured its efficiency.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132084474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Relationships Between Stability and Bug-Proneness of Code Clones: An Empirical Study","authors":"M. S. Rahman, C. Roy","doi":"10.1109/SCAM.2017.26","DOIUrl":"https://doi.org/10.1109/SCAM.2017.26","url":null,"abstract":"Exact or similar copies of code fragments in a code base are known as code clones. Code clones are considered as one of the serious code smells. Stability is a widely investigated perspective of assessing the impacts of clones on software systems. A number of existing studies show that clones are often less stable than non-cloned code. This suggests that clones change more frequently than non-cloned code and thus may require comparatively more maintenance efforts. Again, frequent changes to clones may increase the likelihood of missing change propagation to the co-change candidates leading to inconsistencies or bugs. However, none of the existing studies investigate whether stability of clones is related to the bug-proneness. In this paper, we present an empirical study that analyzes the relationships between stability and bug-proneness of clones. We identify bug-fix commits by analyzing the commit messages from software repositories. We then identify the clones those are changed in the bug-fix commits as bug-prone clones. We then compare the stability of buggy and non-buggy clones considering the fine-grained syntactic change types and their significance.,,Our experimental results based on five open-source Java systems of different size and application domains show that (1) stability and bug-proneness of code clones are related and this relationship is statistically significant, (2) for both exact (Type 1) and near-miss (Type 2 and Type 3) clones, buggy clones tend to have higher frequency of changes than non-buggy clones, (3) the bug-proneness of Type 2 and Type 3 clones tend to be strongly related with their stability compared to Type 1 clones, and (4) the relation between the stability and the bug-proneness of clones with respect to fine-grained change types is likely to be influenced by the changes of low to medium significance. We believe that our findings are important and potentially useful in identifying and prioritizing candidate clones for management.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124852139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Working Around Loops for Infeasible Path Detection in Binary Programs","authors":"Jordy Ruiz, H. Cassé, M. D. Michiel","doi":"10.1109/SCAM.2017.13","DOIUrl":"https://doi.org/10.1109/SCAM.2017.13","url":null,"abstract":"The research of a safe Worst-Case Execution Time (WCET) estimation is necessary to build reliable hard, critical real-time systems. Infeasible paths are a major cause of overestimation of theWorst-Case Execution Time (WCET): without data flow constraints, static analysis by implicit path enumeration will take into account semantically impossible, potentially expensive execution paths, making theWorst-Case Execution Path unreachable in practice. We present in this paper an approach that allows to significantly tighten the WCET by identifying infeasible paths, namely in loops, and injecting them as additional Integer Linear Programming (ILP) constraints during the WCET computation. Our entire analysis, albeit platform independent, works directly on binary programs in order to get the tightest, most reliable WCET. Impactful infeasible paths are largely found within (often nested) loops; therefore having an efficient, exploitable and reasonably scalable representation of the state of a program within loops is a key challenge of infeasible path analysis. We show ours to yield decidedly significant results on a selection of benchmarks from actual hard real-time applications as well as the classic M¨alardalen suite.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos
{"title":"An Exploratory Study of Functional Redundancy in Code Repositories","authors":"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos","doi":"10.1109/SCAM.2017.21","DOIUrl":"https://doi.org/10.1109/SCAM.2017.21","url":null,"abstract":"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115264187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Leuenberger, Haidar Osman, Mohammad Ghafari, Oscar Nierstrasz
{"title":"Harvesting the Wisdom of the Crowd to Infer Method Nullness in Java","authors":"Manuel Leuenberger, Haidar Osman, Mohammad Ghafari, Oscar Nierstrasz","doi":"10.1109/SCAM.2017.22","DOIUrl":"https://doi.org/10.1109/SCAM.2017.22","url":null,"abstract":"Null pointer exceptions are common bugs in Java projects. Previous research has shown that dereferencing the results of method calls is the main source of these bugs, as developers do not anticipate that some methods return null. To make matters worse, we find that whether a method returns null or not (nullness), is rarely documented. We argue that method nullness is a vital piece of information that can help developers avoid this category of bugs. This is especially important for external APIs where developers may not even have access to the code.,,In this paper, we study the method nullness of Apache Lucene, the de facto standard library for text processing in Java. Particularly, we investigate how often the result of each Lucene method is checked against null in Lucene clients. We call this measure method nullability, which can serve as a proxy for method nullness. Analyzing Lucene internal and external usage, we find that most methods are never checked for null. External clients check more methods than Lucene checks internally. Manually inspecting our dataset reveals that some null checks are unnecessary. We present an IDE plugin that complements existing documentation and makes up for missing documentation regarding method nullness and generates nullness annotations, so that static analysis can pinpoint potentially missing or unnecessary null checks.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126845869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Lin, Simone Scalabrino, Andrea Mocci, R. Oliveto, G. Bavota, Michele Lanza
{"title":"Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers","authors":"B. Lin, Simone Scalabrino, Andrea Mocci, R. Oliveto, G. Bavota, Michele Lanza","doi":"10.1109/SCAM.2017.17","DOIUrl":"https://doi.org/10.1109/SCAM.2017.17","url":null,"abstract":"Meaningless identifiers as well as inconsistent use of identifiers in the source code might hinder code readability and result in increased software maintenance efforts. Over the past years, effort has been devoted to promoting a consistent usage of identifiers across different parts of a system through approaches exploiting static code analysis and Natural Language Processing (NLP). These techniques have been evaluated in small-scale studies, but it is unclear how they compare to each other and how they complement each other. Furthermore, a full-fledged larger empirical evaluation is still missing.,,We aim at bridging this gap. We asked developers of five projects to assess the meaningfulness of the recommendations generated by three techniques, two already existing in the literature (one exploiting static analysis, one using NLP) and a novel one we propose. With a total of 922 rename refactorings evaluated, this is, to the best of our knowledge, the largest empirical study conducted to assess and compare rename refactoring tools promoting a consistent use of identifiers. Our study sheds light on the current state-of-the-art in rename refactoring recommenders, and indicates directions for future work.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116580789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Static Code Smell Detector for SQL Queries Embedded in Java Code","authors":"Csaba Nagy, Anthony Cleve","doi":"10.1109/SCAM.2017.19","DOIUrl":"https://doi.org/10.1109/SCAM.2017.19","url":null,"abstract":"A database plays a central role in the architecture of an information system, and the way it stores the data delimits its main features. However, it is not just the data that matters. The way it is handled, i.e., how the application communicates with the database is of critical importance too. Therefore the implementation of such a communication layer has to be reliable and efficient. SQL is a popular language to query a database, and modern technologies rely on it (or its dialects) as query strings embedded in the application code. In many languages (e.g. in Java), an embedded query is typically constructed through several string operations that obstruct developers in understanding the statement finally sent to the database. It is a potential source of fault-prone and inefficient database usage, i.e., code smells. In our paper, we present a tool for the identification of code smells in SQL queries embedded in Java code. Our tool implements a combined static analysis of the SQL statements embedded in the source code, the database schema, and the data in the database. We use a lightweight query extraction algorithm to extract SQL code from the Java code and implement smell detectors on the ASG of our fault-tolerant SQL parser. Depending on the context of the smell, its severity is also determined. Developers can examine the identified issues with the help of an Eclipse plug-in or through command line interfaces.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting Exception Handling Practices with Exception Flow Analysis","authors":"G. B. D. Pádua, Weiyi Shang","doi":"10.1109/SCAM.2017.16","DOIUrl":"https://doi.org/10.1109/SCAM.2017.16","url":null,"abstract":"Modern programming languages, such as Java and C#, typically provide features that handle exceptions. These features separate error-handling code from regular source code and aim to assist in the practice of software comprehension and maintenance. Having acknowledged the advantages of exception handling features, their misuse can still cause reliability degradation or even catastrophic software failures. Prior studies on exception handling aim to understand the practices of exception handling in its different components, such as the origin of the exceptions and the handling code of the exceptions. Yet, the observed findings were scattered and diverse. In this paper, to complement prior research findings on exception handling, we study its features by enriching the knowledge of handling code with a flow analysis of exceptions. Our case study is conducted with over 10K exception handling blocks, and over 77K related exception flows from 16 open-source Java and C# (.NET) libraries and applications. Our case study results show that each try block has up to 12 possible potentially recoverable yet propagated exceptions. More importantly, 22% of the distinct possible exceptions can be traced back to multiple methods (average of 1.39 and max of 34). Such results highlight the additional challenge of composing quality exception handling code. To make it worse, we confirm that there is a lack of documentation of the possible exceptions and their sources. However, such critical information can be identified by exception flow analysis on well-documented API calls (e.g., JRE and.NET documentation). Finally, we observe different strategies in exception handling code between Java and C#. Our findings highlight the opportunities of leveraging automated software analysis to assist in exception handling practices and signify the need of more further in-depth studies on exception handling practice.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}