{"title":"Title Page I","authors":"","doi":"10.1109/scam52516.2021.00001","DOIUrl":"https://doi.org/10.1109/scam52516.2021.00001","url":null,"abstract":"","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125554297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farima Farmahinifarahani, Yadong Lu, V. Saini, P. Baldi, Cristina V. Lopes
{"title":"D-REX: Static Detection of Relevant Runtime Exceptions with Location Aware Transformer","authors":"Farima Farmahinifarahani, Yadong Lu, V. Saini, P. Baldi, Cristina V. Lopes","doi":"10.26226/morressier.613b54401459512fce6a7d03","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7d03","url":null,"abstract":"Runtime exceptions are inevitable parts of software systems. While developers often write exception handling code to avoid the severe outcomes of these exceptions, such code is most effective if accompanied by accurate runtime exception types. Predicting the runtime exceptions that may occur in a program, however, is difficult as the situations that lead to these exceptions are complex. We propose D-REX (Deep Runtime EXception detector), as an approach for predicting runtime exceptions of Java methods based on the static properties of code.The core of D-REX is a machine learning model that leverages the representation learning ability of neural networks to infer a set of signals from code to predict the related runtime exception types. This model, which we call Location Aware Transformer, adapts a state-of-the-art language model, Transformer, to provide accurate predictions for the exception types, as well as interpretable recommendations for the exception prone elements of code. We curate a benchmark dataset of 200,000 Java projects from GitHub to train and evaluate D-REX. Experiments demonstrate that D-REX predicts runtime exception types with 81% of Top 1 accuracy, outperforming multiple non-Transformer baselines by a margin of at least 12%. Furthermore, it can predict the exception prone elements of code with 75% Top 1 precision.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121498697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Naming Amplified Tests Based on Improved Coverage","authors":"Nienke Nijkamp, C. Brandt, A. Zaidman","doi":"10.1109/SCAM52516.2021.00036","DOIUrl":"https://doi.org/10.1109/SCAM52516.2021.00036","url":null,"abstract":"Test amplification generates new test cases that improve the coverage of an existing test suite. To convince developers to integrate these new test cases into their test suite, it is crucial to convey the behavior and the improvement in coverage that the amplified test case provides. In this paper, we present NATIC, an approach to generate names for amplified test cases based on the methods they additionally cover, compared to the existing test suite. In a survey among 16 participants with a background in Computer Science, we show that the test names generated by NATIC are valued similarly to names written by experts. According to the participants, the names generated by NATIC outperform expert-written names with respect to informing about coverage improvement, but lack in conveying a test’s behavior. Finally, we discuss how a restriction to two mentioned methods per name would improve the understandability of the test names generated by NATIC.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115819601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Dominik Schubert, Florian Sattler, Fabian Schiebel, Ben Hermann, E. Bodden
{"title":"Modeling the Effects of Global Variables in Data-Flow Analysis for C/C++","authors":"Philipp Dominik Schubert, Florian Sattler, Fabian Schiebel, Ben Hermann, E. Bodden","doi":"10.26226/morressier.613b54401459512fce6a7cf9","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7cf9","url":null,"abstract":"Global variables make software systems hard to maintain and debug, and break local reasoning. They also impose a non-trivial challenge to static analysis which needs to model its effects to obtain sound analysis results. However, global variable initialization, codes of corresponding constructors and destructors as well as dynamic library code executed during load and unload not only affect control flows but data flows, too. The PhASAR static data-flow analysis framework does not handle these special cases and also does not provide any functionalities to model the effects of globals. Analysis writers are forced to model the desired effects in an ad-hoc manner increasing an analysis’ complexity and imposing an additional repetitive task. In this paper, we present the challenges of modeling globals, elaborate on the impact they have on analysis information, and present a suitable model to capture their effects, allowing for an easier development of global-aware static data-flow analyses. We present an implementation of our model within the PhASAR framework and show its usefulness for an IDE-based linear-constant propagation that crucially requires correct modeling of globals for correctness.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114847679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lodewijk Bergmans, Xander Schrijen, Edwin Ouwehand, M. Bruntink
{"title":"Measuring source code conciseness across programming languages using compression","authors":"Lodewijk Bergmans, Xander Schrijen, Edwin Ouwehand, M. Bruntink","doi":"10.1109/SCAM52516.2021.00015","DOIUrl":"https://doi.org/10.1109/SCAM52516.2021.00015","url":null,"abstract":"It is well-known, and often a topic of heated debates, that programs in some programming languages are more concise than in others. This is a relevant factor when comparing or aggregating volume-impacted metrics on source code written in a combination of programming languages. In this paper, we present a model for measuring the conciseness of programming languages in a consistent, objective and evidence-based way. We present the approach, explain how it is founded on information theoretical principles, present detailed analysis steps and show the quantitative results of applying this model to a large benchmark of diverse commercial software applications. We demonstrate that our metric for language conciseness is strongly correlated with both an alternative analytical approach, and with a large scale developer survey, and show how its results can be applied to improve software metrics for multi-language applications.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123308149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SecuCheck: Engineering configurable taint analysis for software developers","authors":"Goran Piskachev, Ranjith Krishnamurthy, E. Bodden","doi":"10.26226/morressier.613b54401459512fce6a7cfa","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7cfa","url":null,"abstract":"Due to its ability to detect many frequently occurring security vulnerabilities, taint analysis is one of the core static analyses used by many static application security testing (SAST) tools. Previous studies have identified issues that software developers face with SAST tools. This paper reports on our experience in building a configurable taint analysis tool, named SecuCheck, that runs in multiple integrated development environments. SecuCheck is built on top of multiple existing components and comes with a Java-internal domain-specific language fluentTQL for specifying taint-flows, designed for software developers. We evaluate the applicability of SecuCheck in detecting eleven taint-style vulnerabilities in microbench programs and three real-world Java applications with known vulnerabilities. Empirically, we identify factors that impact the runtime of SecuCheck.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigue Wete Nguempnang, Bernhard J. Berger, K. Sohr
{"title":"[Engineering] eNYPD—Entry Points Detector Jakarta Server Faces Use Case","authors":"Rodrigue Wete Nguempnang, Bernhard J. Berger, K. Sohr","doi":"10.1109/SCAM52516.2021.00013","DOIUrl":"https://doi.org/10.1109/SCAM52516.2021.00013","url":null,"abstract":"Which parts of a software system can be accessed by an attacker is a common question in software security. The answer to this question defines where to look for input validation vulnerabilities, which parts of a system to respect during Microsoft’s Threat Modeling, or how to calculate security metrics. Identifying entry points of an application is, therefore, a frequently occurring problem. Additionally, identifying entry points is relevant when analysing many framework-based applications since they no longer have a simple main method.While different analyses implement entry point detection, the presented tool eNYPD explicitly focuses on answering this question for Java-based systems in an analysis-independent manner. It extracts information on entry points statically and persists this information to a separate file. Therefore, it allows reusing the information in different analyses, and researchers do not need to implement a custom entry point detection for each analysis.The presented tool is explained using Jakarta Server Faces, a user-interface technology for Web-based business applications implemented using Java. The paper presents the implemented extraction approach, the internal data model, and the results stored. Finally, in an evaluation, the statically assessed results of eNYPD are compared to a dynamically determined set of entry points. This comparison allows us to demonstrate the correctness of the extracted information.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116902105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Razzaq, J. Buckley, James Patten, Muslim Chochlov, Ashish Rajendra Sai
{"title":"BoostNSift: A Query Boosting and Code Sifting Technique for Method Level Bug Localization","authors":"A. Razzaq, J. Buckley, James Patten, Muslim Chochlov, Ashish Rajendra Sai","doi":"10.26226/morressier.613b54401459512fce6a7cda","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7cda","url":null,"abstract":"Locating bugs is an important, but effort-intensive and time-consuming task, when dealing with large-scale systems. To address this, Information Retrieval (IR) techniques are increasingly being used to suggest potential buggy source code locations, for given bug reports. While IR techniques are very scalable, in practice their effectiveness in accurately localizing bugs in a software system remains low. Results of empirical studies suggest that the effectiveness of bug localization techniques can be augmented by the configuration of queries used to locate buggy code. However, in most IR-based bug localization techniques, presented by researchers, the impact of the queries’ configurations is not fully considered. In a similar vein, techniques consider all code elements as equally suspicious of being buggy while localizing bugs, but this is not always the case either.In this paper, we present a new method-level, information-retrieval-based bug localization technique called \"BoostNSift\". BoostNSift exploits the important information in queries by ‘boost’ing that information, and then ‘sift’s the identified code elements, based on a novel technique that emphasizes the code elements’ specific relatedness to a bug report over its generic relatedness to all bug reports. To evaluate the performance of BoostNSift, we employed a state-of-the-art empirical design that has been commonly used for evaluating file level IR-based bug localization techniques: 6851 bugs are selected from commonly used Eclipse, AspectJ, SWT, and ZXing benchmarks and made openly available for method-level analyses. The performance of BoostNSift is compared with the openly-available state-of-the-art IR-based BugLocator, BLUiR, and BLIA techniques. Experiments show that BoostNSift improves on BLUiR by up to 324%, on BugLocator by up to 297%, and on BLIA up to 120%, in terms of Mean Reciprocal Rank (MRR). Similar improvements are observed in terms of Mean Average Precision (MAP) and Top-N evaluation measures.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127579070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Q. Sarhan, Attila Szatmári, Rajmond Tóth, Árpád Beszédes
{"title":"CharmFL: A Fault Localization Tool for Python","authors":"Q. Sarhan, Attila Szatmári, Rajmond Tóth, Árpád Beszédes","doi":"10.26226/morressier.613b54401459512fce6a7ce4","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7ce4","url":null,"abstract":"Fault localization is one of the most time-consuming and error-prone parts of software debugging. There are several tools for helping developers in the fault localization process, however, they mostly target programs written in Java and C/C++ programming languages. While these tools are splendid on their own, we must not look over the fact that Python is a popular programming language, and still there are a lack of easy-to- use and handy fault localization tools for Python developers. In this paper, we present a tool called “CharmFL” for software fault localization as a plug-in for PyCharm IDE. The tool employs Spectrum-based fault localization (SBFL) to help Python developers automatically analyze their programs and generate useful data at run-time to be used, then to produce a ranked list of potentially faulty program elements (i.e., statements, functions, and classes). Thus, our proposed tool supports different code coverage types with the possibility to investigate these types in a hierarchical approach. The applicability of our tool has been presented by using a set of experimental use cases. The results show that our tool could help developers to efficiently find the locations of different types of faults in their programs.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123792925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Nielebock, Paul Blockhaus, J. Krüger, F. Ortmeier
{"title":"An Experimental Analysis of Graph-Distance Algorithms for Comparing API Usages","authors":"Sebastian Nielebock, Paul Blockhaus, J. Krüger, F. Ortmeier","doi":"10.26226/morressier.613b54401459512fce6a7d05","DOIUrl":"https://doi.org/10.26226/morressier.613b54401459512fce6a7d05","url":null,"abstract":"Modern software development heavily relies on the reuse of functionalities through Application Programming Interfaces (APIs). However, client developers can have issues identifying the correct usage of a certain API, causing misuses accompanied by software crashes or usability bugs. Therefore, researchers have aimed at identifying API misuses automatically by comparing client code usages to correct API usages. Some techniques rely on certain API-specific graph-based data structures to improve the abstract representation of API usages. Such techniques need to compare graphs, for instance, by computing distance metrics based on the minimal graph edit distance or the largest common subgraphs, whose computations are known to be NP-hard problems. Fortunately, there exist many abstractions for simplifying graph distance computation. However, their applicability for comparing graph representations of API usages has not been analyzed. In this paper, we provide a comparison of different distance algorithms of API-usage graphs regarding correctness and runtime. Particularly, correctness relates to the algorithms’ ability to identify similar correct API usages, but also to discriminate similar correct and false usages as well as non-similar usages. For this purpose, we systematically identified a set of eight graph-based distance algorithms and applied them on two datasets of real-world API usages and misuses. Interestingly, our results suggest that existing distance algorithms are not reliable for comparing API usage graphs. To improve on this situation, we identified and discuss the algorithms’ issues, based on which we formulate hypotheses to initiate research on overcoming them.","PeriodicalId":380117,"journal":{"name":"2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121961997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}