2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

CodeCV: Mining Expertise of GitHub Users from Coding Activities CodeCV:从编码活动中挖掘GitHub用户的专业知识

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00021

Daniel Atzberger, Nico Scordialo, Tim Cech, W. Scheibel, Matthias Trapp, J. Döllner

{"title":"CodeCV: Mining Expertise of GitHub Users from Coding Activities","authors":"Daniel Atzberger, Nico Scordialo, Tim Cech, W. Scheibel, Matthias Trapp, J. Döllner","doi":"10.1109/SCAM55253.2022.00021","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00021","url":null,"abstract":"The number of software projects developed collaboratively on social coding platforms is steadily increasing. One of the motivations for developers to participate in open-source software development is to make their development activities easier accessible to potential employers, e.g., in the form of a resume for their interests and skills. However, manual review of source code activities is time-consuming and requires detailed knowledge of the technologies used. Existing approaches are limited to a small subset of actual source code activity and metadata and do not provide explanations for their results. In this work, we present CodeCV, an approach to analyzing the commit activities of a GitHub user concerning the use of programming languages, software libraries, and higher-level concepts, e.g., Machine Learning or Cryptocurrency. Skills in using software libraries and programming languages are analyzed based on syntactic structures in the source code. Based on Labeled Latent Dirichlet Allocation, an automatically generated corpus of GitHub projects is used to learn the concept-specific vocabulary in identifier names and comments. This enables the capture of expertise on abstract concepts from a user's commit history. CodeCV further explains the results through links to the relevant commits in an interactive web dashboard. We tested our system on selected GitHub users who mainly contribute to popular projects to demonstrate that our approach is able to capture developers' expertise effectively.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128628647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Experimental Evaluation of A New Ranking Formula for Spectrum based Fault Localization 基于频谱的故障定位新排序公式的实验评价

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00038

Q. Sarhan, Árpád Beszédes

{"title":"Experimental Evaluation of A New Ranking Formula for Spectrum based Fault Localization","authors":"Q. Sarhan, Árpád Beszédes","doi":"10.1109/SCAM55253.2022.00038","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00038","url":null,"abstract":"Spectrum-Based Fault Localization (SBFL) uses a mathematical formula to determine a suspicion score for each program element (such as a statement, method, or class) based on fundamental statistics (e.g., how many times each element is executed and not executed in passed and failed tests) taken from test coverage and results. Based on the calculated scores, program elements are then ordered from most suspicious to least suspicious. The elements with the highest scores are thought to be the most prone to error. The final ranking list of program elements aids developers in debugging when looking for the source of a fault in the program under test. In this paper, we present a new SBFL ranking formula that enhances a base formula by ranking code elements slightly higher than others that are executed by more failed tests and less passing ones. Its novelty is that it breaks ties between the elements that share the same suspicion score of the base formula. Experiments were conducted on six single-fault programs of the Defects4J dataset to evaluate the effectiveness of the proposed formula. The results show that our new formula when compared to three widely-studied SBFL formulas, achieved a better performance in terms of average ranking. It also achieved positive results in all of the Top-N categories and increased the number of cases where the faulty element became the top-ranked element by 13–23%.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"335 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127575716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Empirical Study of Code Smells in Transformer-based Code Generation Techniques 基于转换器的代码生成技术中代码气味的实证研究

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00014

Mohammed Latif Siddiq, Shafayat H. Majumder, Maisha R. Mim, Sourov Jajodia, Joanna C. S. Santos

{"title":"An Empirical Study of Code Smells in Transformer-based Code Generation Techniques","authors":"Mohammed Latif Siddiq, Shafayat H. Majumder, Maisha R. Mim, Sourov Jajodia, Joanna C. S. Santos","doi":"10.1109/SCAM55253.2022.00014","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00014","url":null,"abstract":"Prior works have developed transformer-based language learning models to automatically generate source code for a task without compilation errors. The datasets used to train these techniques include samples from open source projects which may not be free of security flaws, code smells, and violations of standard coding practices. Therefore, we investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques. To conduct this study, we used Pylint and Bandit to detect code smells and security smells in three widely used training sets (CodeXGlue, APPS, and Code Clippy). We observed that Pylint caught 264 code smell types, whereas Bandit located 44 security smell types in these three datasets used for training code generation techniques. By analyzing the output from ten different configurations of the open-source fine-tuned transformer-based GPT-Neo 125M parameters model, we observed that this model leaked the smells and non-standard practices to the generated source code. When analyzing GitHub Copilot's suggestions, a closed source code generation tool, we observed that it contained 18 types of code smells, including substandard coding patterns and 2 security smell types.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126001591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Improving Weighted-SBFL by Blocking Spectrum 利用频谱阻断改进加权sbfl

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00036

Haruka Yoshioka, Yoshiki Higo, S. Kusumoto

{"title":"Improving Weighted-SBFL by Blocking Spectrum","authors":"Haruka Yoshioka, Yoshiki Higo, S. Kusumoto","doi":"10.1109/SCAM55253.2022.00036","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00036","url":null,"abstract":"Debugging is a costly process in software development, and computer-aided debugging is expected to reduce the cost. In debugging, fault localization is used to identify the location of potentially faulty code. Spectrum-based fault localization (SBFL) identifies program statements that contain faults based on program spectra collected during the execution of the test cases. Conventional SBFL treats all test cases as having equal importance. A weighting technique that assigns importance to test cases based on the similarity of program spectra (where higher similarity indicates higher importance) has been proposed. However, this technique does not significantly improve fault localization accuracy. We attribute this lack of improvement to the presence of sequential program statements, which negatively affect the weighting. In this study, we apply blocking and the weighting of spectra to improve accuracy. We conduct experiments to compare the proposed technique with conventional SBFL and a recent SBFL technique. We show that the proposed technique identifies faulty program statements with higher accuracy than previous SBFL techniques. Weighting based on the similarity of spectra after blocking is thus effective.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128335771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Test Transplantation through Dynamic Test Slicing 通过动态测试切片进行测试移植

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00009

Mehrdad Abdi, S. Demeyer

引用次数: 1

Deep Multimodal Architecture for Detection of Long Parameter List and Switch Statements using DistilBERT 基于蒸馏器的长参数表和开关语句检测的深度多模态结构

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00018

Anushka Bhave, Roopak Sinha

{"title":"Deep Multimodal Architecture for Detection of Long Parameter List and Switch Statements using DistilBERT","authors":"Anushka Bhave, Roopak Sinha","doi":"10.1109/SCAM55253.2022.00018","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00018","url":null,"abstract":"Code smell detection and refactoring are crucial to sustain quality, reduce complexity and increase the efficiency of a software application. Code smells are observable patterns in the source code of a program that indicate deeper structural issues. Most traditional methods for code smell classification rely exclusively on structural object-oriented metrics and manually-designed heuristics. We propose a novel multimodal deep learning approach that combines structural and semantic information to detect two commonly-encountered code smells: Long Parameter Lists and Switch Statements. The presented architecture applies transfer learning on DistilBERT to generate vector embeddings representing classes and methods concatenated with numerical metrics for joint feature extraction using CNN, to build a complex mapping between the features and predict the output as smelly or non-smelly. Subsequently, to perform a holistic comparative analysis we also implement two multimodal machine learning pipelines, the first employs a sci-kit learn TF-IDF Vectorizer with Random Forest Classifier, and the second merges CNN with Bi-LSTM. Our approach achieves an accuracy of 91.2% as corroborated by experimental evaluation, outperforming the state-of-the-art techniques.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125946700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards the Detection of Hidden Familial Type Correlations in Java Code Java代码中隐藏家族型相关性的检测

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00022

Alin-Petru Roşu, Petru Florin Mihancea

引用次数: 0

Removing dependencies from large software projects: are you really sure? 从大型软件项目中移除依赖:你真的确定吗?

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00017

Ching-Chi Chuang, Luís Cruz, R. V. Dalen, Vladimir Mikovski, A. Deursen

{"title":"Removing dependencies from large software projects: are you really sure?","authors":"Ching-Chi Chuang, Luís Cruz, R. V. Dalen, Vladimir Mikovski, A. Deursen","doi":"10.1109/SCAM55253.2022.00017","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00017","url":null,"abstract":"When developing and maintaining large software systems, a great deal of effort goes into dependency management. During the whole lifecycle of a software project, the set of dependencies keeps changing to accommodate the addition of new features or changes in the running environment. Package management tools are quite popular to automate this process, making it fairly easy to automate the addition of new dependencies and respective versions. However, over the years, a software project might evolve in a way that no longer needs a particular technology or dependency. But the choice of removing that dependency is far from trivial: one cannot be entirely sure that the dependency is not used in any part of the project. Hence, developers have a hard time confidently removing dependencies and trusting that it will not break the system in production. In this paper, we propose a decision framework to improve the detection of unused dependencies. Our approach builds on top of the existing dependency analysis tool DepClean. We start by improving the support of Java dynamic features in DepClean. We do so by augmenting the analysis with the state-of-the-art call graph generation tool OPAL. Then, we analyze the potentially unused dependencies detected by classifying their logical relationship with the other components to decide on follow-up steps, which we provide in the form of a decision diagram. Results show that developers can focus their efforts on maintaining bloated dependencies by following the recommendations of our decision framework. When applying our approach to a large industrial software project, we can reduce one-third of false positives when compared to the state-of-the-art. We also validate our approach by analyzing dependencies that were removed in the history of open-source projects. Results show consistency between our approach and the decisions taken by open-source developers.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A preliminary evaluation on the relationship among architectural and test smells 对架构气味和测试气味之间关系的初步评价

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00013

M. D. Stefano, Fabiano Pecorelli, D. D. Nucci, A. D. Lucia

引用次数: 0

An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms 静态分析告警合并与重新定位的实证评价

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI: 10.1109/SCAM55253.2022.00031

Niloofar Mansoor, Tukaram Muske, Alexander Serebrenik, Bonita Sharif

{"title":"An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms","authors":"Niloofar Mansoor, Tukaram Muske, Alexander Serebrenik, Bonita Sharif","doi":"10.1109/SCAM55253.2022.00031","DOIUrl":"https://doi.org/10.1109/SCAM55253.2022.00031","url":null,"abstract":"Static analysis tools generate a large number of alarms that require manual inspection. In prior work, repositioning of alarms is proposed to (1) merge multiple similar alarms together and replace them by a fewer alarms, and (2) report alarms as close as possible to the causes for their generation. The premise is that the proposed merging and repositioning of alarms will reduce the manual inspection effort. To evaluate the premise, this paper presents an empirical study with 249 developers on the proposed merging and repositioning of static alarms. The study is conducted using static analysis alarms generated on $C$ programs, where the alarms are representative of the merging vs. non-merging and repositioning vs. non-repositioning situations in real-life code. Developers were asked to manually inspect and determine whether assertions added corresponding to alarms in $C$ code hold. Additionally, two spatial cognitive tests are also done to determine relationship in performance. The empirical evaluation results indicate that, in contrast to expectations, there was no evidence that merging and repositioning of alarms reduces manual inspection effort or improves the inspection accuracy (at times a negative impact was found). Results on cognitive abilities correlated with comprehension and alarm inspection accuracy.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1