2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献_第2页

[Engineering Paper] Graal: The Quest for Source Code Knowledge [工程论文]格拉尔:对源代码知识的探索

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00021

Valerio Cosentino, Santiago Dueñas, Ahmed Zerouali, G. Robles, Jesus M. Gonzalez-Barahona

{"title":"[Engineering Paper] Graal: The Quest for Source Code Knowledge","authors":"Valerio Cosentino, Santiago Dueñas, Ahmed Zerouali, G. Robles, Jesus M. Gonzalez-Barahona","doi":"10.1109/SCAM.2018.00021","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00021","url":null,"abstract":"Source code analysis tools are designed to analyze code artifacts with different intents, which span from improving the quality and security of the software to easing refactoring and reverse engineering activities. However, most tools do not come with features to periodically schedule their analysis or to be executed on a battery of repositories, and lack support to combine their results with other analysis tools. Thus, researchers and practitioners are often forced to develop ad-hoc scripts to meet their needs. This comes at the risk of obtaining wrong results (because of the lack of testing) and of hindering replication by other research teams. In addition, the resulting scripts are often not meant to be customized nor designed for incrementality, scalability and extensibility. In this paper we present Graal, which empowers users with a customizable, scalable and incremental approach to conduct source code analysis and enables relating the obtained results with other software project data. Graal leverages on and extends the functionalities of GrimoireLab, a strong free software tool developed by Bitergia, a company devoted to offer commercial software development analytics, and part of the CHAOSS project of the Linux Foundation.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124132466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[Research Paper] The Case for Adaptive Change Recommendation [研究论文]适应性变化建议的案例

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00022

Sydney Pugh, D. Binkley, L. Moonen

{"title":"[Research Paper] The Case for Adaptive Change Recommendation","authors":"Sydney Pugh, D. Binkley, L. Moonen","doi":"10.1109/SCAM.2018.00022","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00022","url":null,"abstract":"As the complexity of a software system grows, it becomes increasingly difficult for developers to be aware of all the dependencies that exist between artifacts (e.g., files or methods) of the system. Change impact analysis helps to overcome this problem, as it recommends to a developer relevant source-code artifacts related to her current changes. Association rule mining has shown promise in determining change impact by uncovering relevant patterns in the system's change history. State-of-the-art change impact mining algorithms typically make use of a change history of tens of thousands of transactions. For efficiency, targeted association rule mining focuses on only those transactions potentially relevant to answering a particular query. However, even targeted algorithms must consider the complete set of relevant transactions in the history. This paper presents ATARI, a new adaptive approach to association rule mining that considers a dynamic selection of the relevant transactions. It can be viewed as a further constrained version of targeted association rule mining, in which as few as a single transaction might be considered when determining change impact. Our investigation of adaptive change impact mining empirically studies seven algorithm variants. We show that adaptive algorithms are viable, can be just as applicable as the start-of-the-art complete-history algorithms, and even outperform them for certain queries. However, more important than the direct comparison, our investigation lays necessary groundwork for the future study of adaptive techniques and their application to challenges such as the on-the-fly style of impact analysis that is needed at the GitHub-scale.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126371633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

[Engineering Paper] SCC: Automatic Classification of Code Snippets [工程论文]SCC:代码片段自动分类

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00031

Kamel Alreshedy, Dhanush Dharmaretnam, D. Germán, Venkatesh Srinivasan, T. Gulliver

{"title":"[Engineering Paper] SCC: Automatic Classification of Code Snippets","authors":"Kamel Alreshedy, Dhanush Dharmaretnam, D. Germán, Venkatesh Srinivasan, T. Gulliver","doi":"10.1109/SCAM.2018.00031","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00031","url":null,"abstract":"Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the programming language of source code files. However, determining the programming language of a code snippet or a few lines of source code is still a challenging task. Online forums such as Stack Overflow and code repositories such as GitHub contain a large number of code snippets. In this paper, we describe Source Code Classification (SCC), a classifier that can identify the programming language of code snippets written in 21 different programming languages. A Multinomial Naive Bayes (MNB) classifier is employed which is trained using Stack Overflow posts. It is shown to achieve an accuracy of 75% which is higher than that with Programming Languages Identification (PLI-a proprietary online classifier of snippets) whose accuracy is only 55.5%. The average score for precision, recall and the F1 score with the proposed tool are 0.76, 0.75 and 0.75, respectively. In addition, it can distinguish between code snippets from a family of programming languages such as C, C++ and C#, and can also identify the programming language version such as C# 3.0, C# 4.0 and C# 5.0.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134073255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

[Research Paper] Automatic Checking of Regular Expressions [研究论文]正则表达式的自动检查

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00034

E. Larson

{"title":"[Research Paper] Automatic Checking of Regular Expressions","authors":"E. Larson","doi":"10.1109/SCAM.2018.00034","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00034","url":null,"abstract":"Regular expressions are extensively used to process strings. The regular expression language is concise which makes it easy for developers to use but also makes it easy for developers to make mistakes. Since regular expressions are compiled at run-time, the regular expression compiler does not give any feedback on potential errors. This paper describes ACRE - Automatic Checking of Regular Expressions. ACRE takes a regular expression as input and performs 11 different checks on the regular expression. The checks are based on common mistakes. Among the checks are checks for incorrect use of character sets (enclosed by []), wildcards (represented by.), and line anchors (^ and $). ACRE has found errors in 283 out of 826 regular expressions. Each of the 11 checks found at least seven errors. The number of false reports is moderate: 46 of the regular expressions contained a false report. ACRE is simple to use: the user enters a regular expressions and presses the check button. Any violations are reported back to the user with the incorrect portion of the regular expression highlighted. For 9 of the 11 checks, an example accepted string is generated that further illustrates the error.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127488690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[Research Paper] Which Method-Stereotype Changes are Indicators of Code Smells? [研究论文]哪种方法-构造型变化是代码气味的指示器?

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00017

M. J. Decker, Christian D. Newman, Natalia Dragan, M. Collard, Jonathan I. Maletic, Nicholas A. Kraft

引用次数: 4

[Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring [工程论文]一个通过自动重构优化Java 8流软件的工具

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00011

Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed

{"title":"[Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring","authors":"Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed","doi":"10.1109/SCAM.2018.00011","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00011","url":null,"abstract":"Streaming APIs are pervasive in mainstream Object-Oriented languages and platforms. For example, the Java 8 Stream API allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we describe the engineering aspects of an open source automated refactoring tool called Optimize Streams that assists developers in writing optimal stream software in a semantics-preserving fashion. Based on a novel ordering and typestate analysis, the tool is implemented as a plug-in to the popular Eclipse IDE, using both the WALA and SAFE frameworks. The tool was evaluated on 11 Java projects consisting of ~642 thousand lines of code, where we found that 36.31% of candidate streams were refactorable, and an average speedup of 1.55 on a performance suite was observed. We also describe experiences gained from integrating three very different static analysis frameworks to provide developers with an easy-to-use interface for optimizing their stream code to its full potential.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127553756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

[Engineering Paper] Identifying Feature Clones in a Suite of Systems [工程论文]识别系统套件中的特征克隆

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00024

Muslim Chochlov, M. English, J. Buckley, D. Ilie, Maria Scanlon

{"title":"[Engineering Paper] Identifying Feature Clones in a Suite of Systems","authors":"Muslim Chochlov, M. English, J. Buckley, D. Ilie, Maria Scanlon","doi":"10.1109/SCAM.2018.00024","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00024","url":null,"abstract":"As part of a module re-unification project of an industrial partner's code, spanning one systems and two derivative systems, the feature-clone variants across these systems have to be extracted, to be later re-unified as singular code elements for re-use. To assist developers with this task, the CoRA (The Code Re-unification Application) tool was designed and implemented. An approach, and the subsequent design of the tool was derived from reflection on manual feature-location/clonedetection efforts on the company's systems, in the first phase of an action research cycle where the approach/implementation will be iteratively trialled, and subsequently refined, in-situ. A pilot study is discussed that leads to the proposed tool. The tool combines a hybrid (textual-static) feature location technique and a textual clone detection technique for featureclone identification. In this paper, the rationale behind the CoRA tool is presented, followed by a tool overview and its implementation details. Finally, an example use case shows how the tool is used to locate clones of a particular feature.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"679 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128337027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

[Research Paper] Combining Obfuscation and Optimizations in the Real World [研究论文]现实世界中混淆与优化的结合

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00010

S. Guelton, A. Guinet, Pierrick Brunet, J. Caamaño, F. Dagnat, Nicolas Szlifierski

{"title":"[Research Paper] Combining Obfuscation and Optimizations in the Real World","authors":"S. Guelton, A. Guinet, Pierrick Brunet, J. Caamaño, F. Dagnat, Nicolas Szlifierski","doi":"10.1109/SCAM.2018.00010","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00010","url":null,"abstract":"Code obfuscation is the de facto standard to protect intellectual property when delivering code in an unmanaged environment. It relies on additive layers of code tangling techniques, white-box encryption calls and platform-specific or tool-specific countermeasures to make it harder for a reverse engineer to access critical pieces of data or to understand core algorithms. The literature provides plenty of different obfuscation techniques that can be used at compile time to transform data or control flow in order to provide some kind of protection against different reverse engineering scenarii. Scheduling code transformations to optimize a given metric is known as the pass scheduling problem, a problem known to be NP-hard, but solved in a practical way using hard-coded sequences that are generally satisfactory. Adding code obfuscation to the problem introduces two new dimensions. First, as a code obfuscator needs to find a balance between obfuscation and performance, pass scheduling becomes a multi-criteria optimization problem. Second, obfuscation passes transform their inputs in unconventional ways, which means some pass combinations may not be desirable or even valid. This paper highlights several issues met when blindly chaining different kind of obfuscation and optimization passes, emphasizing the need of a formal model to combine them. It proposes a non-intrusive formalism to leverage on sequential pass management techniques. The model is validated on real-world scenarii gathered during the development of an industrial-strength obfuscator on top of the LLVM compiler infrastructure.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[Research Paper] Static JavaScript Call Graphs: A Comparative Study [研究论文]静态JavaScript调用图:比较研究

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00028

Gábor Antal, Péter Hegedüs, Z. Tóth, R. Ferenc, T. Gyimóthy

{"title":"[Research Paper] Static JavaScript Call Graphs: A Comparative Study","authors":"Gábor Antal, Péter Hegedüs, Z. Tóth, R. Ferenc, T. Gyimóthy","doi":"10.1109/SCAM.2018.00028","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00028","url":null,"abstract":"The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100% precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116582159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[Research Paper] POI: Skew-Aware Parallel Race Detection [研究论文]POI:倾斜感知并行竞赛检测

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI: 10.1109/SCAM.2018.00033

Yoshitaka Sakurai, Yoshitaka Arahori, K. Gondow

{"title":"[Research Paper] POI: Skew-Aware Parallel Race Detection","authors":"Yoshitaka Sakurai, Yoshitaka Arahori, K. Gondow","doi":"10.1109/SCAM.2018.00033","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00033","url":null,"abstract":"Multithreaded programs are prone to dataraces. Dataraces are known to be hard to detect and reproduce by manual effort, although they often have detrimental effects on program reliability. Automated techniques are thus demanded for detecting dataraces efficiently and precisely. There have been proposed a lot of datarace detectors so far, among which dynamic ones are promising because of their precision. However, existing dynamic race detectors incur high race-checking overheads. Even a state-of-the-art dynamic race detector, called Parallel FastTrack, fails to efficiently detect races under certain conditions, despite its attempt to parallelize race detection for efficiency. In this paper, we propose an efficient and precise parallel race detector. For our proposal, we first experimentally reveal that the load-distribution policy of Parallel FastTrack tends to skew race-checking loads to a few detection threads. We then present a simple but effective technique, called POI, for balancing race-checking loads among detection threads. POI takes race-checking loads of each detection thread into account and reduces the load skew by making each detection thread manage almost the same number of memory addresses to be checked. Experiments on several real multithreaded data-processing applications show that POI succeeded in reducing, on average, about 37% of race detection overheads, which the load-distribution policy of Parallel FastTrack would impose.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131520884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1