2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

筛选
英文 中文
JFeature: Know Your Corpus JFeature:了解语料库
Idriss Riouak, G. Hedin, Christoph Reichenbach, Niklas Fors
{"title":"JFeature: Know Your Corpus","authors":"Idriss Riouak, G. Hedin, Christoph Reichenbach, Niklas Fors","doi":"10.1109/SCAM55253.2022.00033","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00033","url":null,"abstract":"Software corpora are crucial for evaluating research artifacts and ensuring repeatability of outcomes. Corpora such as DaCapo and Defects4J provide a collection of real-world open-source projects for evaluating the robustness and performance of software tools like static analysers. However, what do we know about these corpora? What do we know about their composition? Are they really suited for our particular problem? We developed JFEATURE, an extensible static analysis tool that extracts syntactic and semantic features from Java programs, to assist developers in answering these questions. We demonstrate the potential of JFEATURE by applying it to four widely-used corpora in the program analysis area, and we suggest other applications, including longitudinal studies of individual Java projects and the creation of new corpora.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124984108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal Code Summarization with Retrieved Summary 带有检索摘要的多模态代码摘要
Lile Lin, Zhiqiu Huang, Yaoshen Yu, Ya-Ping Liu
{"title":"Multi-Modal Code Summarization with Retrieved Summary","authors":"Lile Lin, Zhiqiu Huang, Yaoshen Yu, Ya-Ping Liu","doi":"10.1109/SCAM55253.2022.00020","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00020","url":null,"abstract":"A high-quality code summary describes the functionality and purpose of a code snippet concisely, which is key to program comprehension. Automatic code summarization aims to generate natural language summaries from code snippets automatically, which can save developers time and improve efficiency in development and maintenance. Recently, researchers mainly use neural machine translation (NMT) based approaches to fill this task. They apply a neural model to translate code snippets into natural language summaries. However, the performance of existing NMT-based approaches is limited. Although a summary and a code snippet are semantically related, they may not share common lexical tokens or language structures. Such a semantic gap between codes and summaries hinders the effect of NMT-based models. Only using code tokens to represent a code snippet cannot help NMT-based models overcome this gap. To solve this problem, in this paper, we propose a code summarization approach that incorporates lexical, syntactic and semantic modalities of codes. We treat code tokens as the lexical modality and the abstract syntax tree (AST) as the syntactic modality. To obtain the semantic modality, inspired by translation memory (TM) in NMT, we use the information retrieval (IR) technique to retrieve a relevant summary for a code snippet to describe its functionality. We propose a novel approach based on contrastive learning to build a retrieval model to retrieve semantically similar summaries. Our approach learns and fuses those different modalities using Transformer. We evaluate our approach on a large Java dataset, experiment results show that our approach outperforms the state-of-the-art approaches on automatic evaluation metrics BLEU, ROUGE and METEOR by 10%, 8% and 9%.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121615446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining for Framework Instantiation Pattern Interplays 挖掘框架实例化模式的相互作用
Yunior Pacheco, Ahmed Zerouali, Coen De Roover
{"title":"Mining for Framework Instantiation Pattern Interplays","authors":"Yunior Pacheco, Ahmed Zerouali, Coen De Roover","doi":"10.1109/SCAM55253.2022.00019","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00019","url":null,"abstract":"Software frameworks define generic application blueprints which can be instantiated into an application through application-specific instantiation actions such as overriding a method or providing an object that implements an interface. In case the framework's documentation falls short, developers may use other instantiations of the same framework as a guide to the required instantiation actions. In this paper, we propose an automated approach to mining framework instantiation patterns from existing open-source instantiations. The approach leverages a graph-based representation to capture the common ways of implementing instantiation actions as well as their interplays, so called instantiation interplays. As a case study, we mined for patterns in a set of 2,028 Java projects that instantiate four of the most popular Java frameworks. We also classify the extracted interplays according to the different contexts in which they occur. We found that our approach discovers relevant practices and interplays that are not covered by previous approaches. Our results will allow developers to have a better understanding of the frameworks they instantiate.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125223603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Don't DIY: Automatically transform legacy Python code to support structural pattern matching 不要DIY:自动转换遗留Python代码以支持结构模式匹配
B. Rózsa, Gábor Antal, R. Ferenc
{"title":"Don't DIY: Automatically transform legacy Python code to support structural pattern matching","authors":"B. Rózsa, Gábor Antal, R. Ferenc","doi":"10.1109/SCAM55253.2022.00024","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00024","url":null,"abstract":"As data becomes more and more complex as technology evolves, the need to support more complex data types in programming languages has grown. However, without proper storage and manipulation capabilities, handling such data can result in hard-to-read, difficult-to-maintain code. Therefore, programming languages continuously evolve to provide more and more ways to handle complex data. Python 3.10 introduced structural pattern matching, which serves this exact purpose: we can split complex data into relevant parts by examining its structure, and store them for later processing. Previously, we could only use the traditional conditional branching, which could have led to long chains of nested conditionals. Maintaining such code fragments can be cumbersome. In this paper, we present a complete framework to solve the aforementioned problem. Our software is capable of examining Python source code and transforming relevant conditionals into structural pattern matching. Moreover, it is able to handle nested conditionals and it is also easily extensible, thus the set of possible transformations can be easily increased.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116240956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semi-Automatic Refactoring to C++20 Modules: A Semi-Success Story c++ 20模块的半自动重构:一个半成功的故事
Richárd Szalay, Z. Porkoláb
{"title":"Semi-Automatic Refactoring to C++20 Modules: A Semi-Success Story","authors":"Richárd Szalay, Z. Porkoláb","doi":"10.1109/SCAM55253.2022.00011","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00011","url":null,"abstract":"The component-based design of software projects is a desired property both for development and ease of code comprehension. Programming languages have long allowed component-based development (e.g., Java packages, Python modules); however, other languages, especially C and C++, had stuck to the “translation unit” model where every source file is individually compiled. The Modules system of C++20 was expected to allow cleaner encapsulation of concern. In this paper, we investigate the effort of a (semi-)automatic modularisation of existing C++ projects. Based on our investigation, upgrading existing software systems to the new Modules feature is extremely hard due to coupling issues arising from necessarily legacy design. Implementing real transition requires a significant redesign of both project-internal and user-facing programming interfaces.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114886147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plug and Analyze: Usable Dynamic Taint Tracker for Android Apps 插入和分析:可用的动态污渍跟踪Android应用程序
Hiroki Inayoshi, S. Kakei, S. Saito
{"title":"Plug and Analyze: Usable Dynamic Taint Tracker for Android Apps","authors":"Hiroki Inayoshi, S. Kakei, S. Saito","doi":"10.1109/SCAM55253.2022.00008","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00008","url":null,"abstract":"Taint analyses, especially static taint analyses, are utilized to uncover hidden and suspicious behaviors in Android apps. However, current static taint analyzers use imprecise Android models, producing unreliable results and increasing the result verification cost. On the other hand, current dynamic taint trackers accurately detect execution paths. However, they depend on specific Android versions and modified devices, reducing their usability. Also, the users may not be able to analyze prepared datasets comprehensively. The results of the current analyses would be biased and less trustworthy. This paper presents a new dynamic taint analyzer called T-Recs that tracks information flows by recording the app execution at the app's bytecode level on an Android device and reconstructing the execution on a server independently of specific Android versions and devices. The users can instantly start analyzing apps with T-Recs after plugging an unmodified device into their computer. We implemented and evaluated T-Recs with 158 apps of DroidBench 3.0 in comparison with current taint analyzers: FlowDroid (w/ and w/o IC3), Amandroid, DroidSafe, and TaintDroid (w/ and w/o IntelliDroid), and only T-Recs achieved 100% accuracy. The result of privacy leak detection in 96 popular Google Play apps shows that T-Recs detected 43 true positives, the highest among compared tools. Also, T-Recs analyzed 39,480 apps from Google Play and Anzhi, showing that T-Recs can be applied to apps that vary in supported SDK versions. Further, the result of ID leak detection in 158 popular apps from Google Play in 2021 shows that T-Recs can detect leaks in recently-developed apps. T-Recs is one of the promising tools for future app analysis.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124151079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deriving Modernity Signatures for PHP Systems with Static Analysis 用静态分析获得PHP系统的现代性特征
Wouter Van den Brink, M. Gerhold, V. Zaytsev
{"title":"Deriving Modernity Signatures for PHP Systems with Static Analysis","authors":"Wouter Van den Brink, M. Gerhold, V. Zaytsev","doi":"10.1109/SCAM55253.2022.00027","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00027","url":null,"abstract":"The PHP language has undergone many changes in its syntax and grammar, with respect to both features the language has to offer as well as the distribution of language features used by programmers in their projects. We present a novel method of using grammar usage statistics to calculate a modernity signature for a PHP system, so that we can determine its age. The system will aid developers in choosing whether or not to execute or use a PHP system, without having to perform an extensive inspection.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lint-Based Warnings in Python Code: Frequency, Awareness and Refactoring Python代码中基于lint的警告:频率、意识和重构
Naelson D. C. Oliveira, Márcio Ribeiro, R. Bonifácio, Rohit Gheyi, I. Wiese, B. Neto
{"title":"Lint-Based Warnings in Python Code: Frequency, Awareness and Refactoring","authors":"Naelson D. C. Oliveira, Márcio Ribeiro, R. Bonifácio, Rohit Gheyi, I. Wiese, B. Neto","doi":"10.1109/SCAM55253.2022.00030","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00030","url":null,"abstract":"Python is a popular programming language characterized by its simple syntax and easy learning curve. Like many languages, Python has a set of best practices that should be followed to avoid bugs and improve other quality attributes (such as maintenance and readability). In this context, non-compliance to these practices can be detected by using linting tools. Previous work conducted studies to better understand the frequency of a class of problems that can be found using Python linters: warnings, here named as lint-based warnings. However, they either rely on small datasets or focus on few domains, such as machine learning or web-systems projects. In this paper, we provide a mixed-method study where we analyze the frequency of six lint-based warnings in 1,119 different open-source general-purpose Python projects. To go further, we also conduct a survey to check whether developers are aware of the lint-based warnings we study here. In particular, we intend to check whether they are able to identify the six lint-based warnings. To remove the lint-based warnings, we suggest the application of simple refactorings. Last but not least, we evaluate the suggestions by submitting pull requests to remove lint-based warnings from open-source projects. Our results show that 39% of the 1,119 projects have at least one lint-based warning. After analyzing the survey data, we also show that developers prefer Python code without lint-based warnings. Regarding the pull requests, we achieve a 71.8% of acceptance rate.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122732664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication 重新审视反模式对错误倾向的影响:差异化复制
Aurel Ikama, Vincent Du, Philippe Belias, B. Muse, Foutse Khomh, Mohammad Hamdaqa
{"title":"Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication","authors":"Aurel Ikama, Vincent Du, Philippe Belias, B. Muse, Foutse Khomh, Mohammad Hamdaqa","doi":"10.1109/SCAM55253.2022.00012","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00012","url":null,"abstract":"Anti-patterns manifesting on software code through code smells have been investigated in terms of their prevalence, detection, refactoring, and impact on software quality attributes. In particular, leveraging heuristics to identify fault-fixing commits, Khomh et al. have found that anti-patterns and code smells have an impact on the fault-proneness of a software system. Similarly, Saboury et al. found a relationship between anti-pattern occurrences and fault-proneness, using heuristic to identify fault-fixing commits and fault-inducing changes. However, recent studies question the accuracy of heuristics, and thus the validity of empirical studies that leverage it. Hence, in this work, we would like to investigate to what extent the results of empirical studies using heuristics to identify bug fix commits are affected by the limitations of the heuristics based approach using manually validated bug fix commits as a ground truth. In particular, we conduct a differentiated replication of the work by Khomh et al. We particularly focused on the impact of anti-patterns on fault-proneness as it is the only dependent variable that may be affected by noise in the collected faults data. In our differentiated replication study, (1) we expanded the number of subject systems from 5 to 38, (2) utilized a manually validated dataset of bug-fixing commits from the work of Herbold et al., and (3) answered research questions from Khomh et al., that are related to the relationship between anti-pattern occurrences and fault-proneness. (4) We added an additional research question to investigate if combining results from several heuristic-based approaches could help reduce the impact of noise. Our findings show that the impact of the noise generated by the automatic algorithm heuristic based is negligible for the studied subject systems; meaning that the reported relation observed on noisy data still holds on the clean data. However, we also observed that combining results from several heuristic based approaches do not reduce this noise, quite the contrary.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Checking Refactoring Detection Results Using Code Changes Encoding for Improved Accuracy 使用代码更改编码来检查重构检测结果以提高准确性
Liang Tan, Christoph Bockisch
{"title":"Checking Refactoring Detection Results Using Code Changes Encoding for Improved Accuracy","authors":"Liang Tan, Christoph Bockisch","doi":"10.1109/SCAM55253.2022.00016","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00016","url":null,"abstract":"For example during software maintenance, it is often important to know the reason for a code change and therefore tools are researched to automatically detect changes due to refactorings. The tool RefDiff can achieve this supporting multiple programming languages. It provides a good precision, but at the cost of a large number of false negative results due to the necessary use of a high threshold in refactoring candidate selection. We have created a result checker that improves the overall performance of RefDiff by including more candidates and reducing false positives from RefDiff detection results afterwards. The checker encodes the textual differences (so-called diffs) corresponding to the results and uses machine learning to predict the contained refactoring type. The main contribution of this paper is the approach for extracting the diffs from the detection results and encoding them as image data for machine learning processing, as well as the training of the machine learning algorithm. We have shown that lowering the candidate threshold in conjunction with the checker improves not only the recall of RefDiff, also the precision is increased. Our approach improves the RefDiff detection results to 99.5% precision and 95.2% recall.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信