{"title":"Guest Editorial for the Special Issue on Source Code Analysis and Manipulation, SCAM 2022","authors":"Banani Roy, Mohammad Ghafari, Mariano Ceccato","doi":"10.1002/smr.70006","DOIUrl":null,"url":null,"abstract":"<p>This issue of the <i>Journal of Software:Evolution and Process</i> focuses on the foundation of software engineering—the source code itself. While much of the software engineering community properly emphasizes aspects like specification, design, and requirements engineering, the source code provides the only precise description of a system's behavior. Therefore, the analysis and manipulation of source code remain critical concerns.</p><p>This issue contains, among others, the extended version of the best papers presented at the 22nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2022) held in Limassol Cyprus, in October 2022.</p><p>The SCAM Conference aims to bring together researchers and practitioners working on theory, techniques, and applications that concern analysis and/or manipulation of the source code of software systems. The term <i>“source code”</i> refers to any fully executable description of a software system, such as machine code, (very) high-level languages, and executable graphical representations of systems. The term <i>“analysis”</i> refers to any (semi)automated procedure that yields insight into source code, while <i>“manipulation”</i> refers to any automated or semi-automated procedure that takes and returns source code. While much attention in the wider software engineering community is directed towards other aspects of systems development and evolution, such as specification, design, and requirements engineering, it is the source code that contains the only precise description of the behavior of a system. Hence, the analysis and manipulation of source code remains a pressing concern for which SCAM 2022 solicited high-quality paper submissions.</p><p>The SCAM 2022 conference received a total of 73 submissions. There were 45 submissions to the main research track, of which one was desk rejected for violation of the double-blind policy. The remaining 44 submissions went through a thorough review process. Every paper was fully reviewed by three or more program committee members for relevance, soundness and originality and discussed before final decisions were made. The program committee decided to accept 17 papers (acceptance rate 39%). The Engineering track has received 11 submissions, desk rejected one and accepted 5, the NIER track has received 16 submissions and accepted 10, and finally, the RENE track has received 4 submissions and accepted 1.</p><p>A public open call was published to invite outstanding papers by other authors on source code analysis and manipulation. In total, 10 papers were submitted to this special issue. Each of the submissions was reviewed by a minimum of three expert referees. Following the first round of review, the authors were asked to revise their papers in response to the referees' comments, and the revised drafts were then reviewed for conformance to the referees' comments. Among those, only five papers were selected for publication in this special issue. The selected papers represent some of the very best work that has appeared at SCAM and cover all of its main areas of interest, namely, refactoring by Yang Zhang and Shuai Hong and by Richárd Szalay and Zoltán Porkoláb; design pattern detection by Hugo Andrade, João Bispo and Filipe F. Correia; string analysis by Luca Negrini, Vincenzo Arceri, Agostino Cortesi and Pietro Ferrara; and regression testing by Francesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, and Luigi Libero Lucio Starace.</p><p>In the first paper “ReInstancer: An Automatic Refactoring Approach for Instanceof Pattern Matching”, Zhang et al. present ReInstancer, a tool for automating the refactoring of instanceof pattern matching by optimizing multibranch statements into switch expressions, improving code quality and readability. It demonstrated effectiveness by refactoring over 7700 instances across 20 real-world projects.</p><p>The paper by Szalay et al. “Refactoring to Standard C++20 Modules” presents a semi-automatic method for modularizing existing C++ projects using dependency analysis and clustering to organize elements into modules. The study reveals that upgrading to C++20 Modules is constrained by the project's existing architectural design.</p><p>In the third paper “Multi-Language Detection of Design Pattern Instances”, Andrade et al. present DP-LARA which is a multilanguage pattern detection tool that leverages the LARA framework's virtual Abstract Syntax Tree (AST) to identify design patterns across object-oriented programming languages. It enables language-agnostic code analysis for improved software comprehension.</p><p>The paper by Negrini et al. “Tarsis: an effective automata-based abstract domain for string analysis” presents a novel abstract domain for string values based on finite state automata that outperforms the baseline for string analysis, a typical task on source code analysis.</p><p>In the last paper “Regression Test Prioritization Leveraging Source Code Similarity with Tree Kernels”, Altiero et al. introduce two novel Regression Test Prioritization (RTP) techniques that apply Tree Kernels to Abstract Syntax Trees of source code to measure structural changes and prioritize tests accordingly. Evaluated across five Java projects, the proposed methods achieve superior fault detection rates compared to traditional RTP approaches.</p><p>We hope you find these papers engaging and encourage those interested to join us at future SCAM conferences.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 3","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70006","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70006","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
This issue of the Journal of Software:Evolution and Process focuses on the foundation of software engineering—the source code itself. While much of the software engineering community properly emphasizes aspects like specification, design, and requirements engineering, the source code provides the only precise description of a system's behavior. Therefore, the analysis and manipulation of source code remain critical concerns.
This issue contains, among others, the extended version of the best papers presented at the 22nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2022) held in Limassol Cyprus, in October 2022.
The SCAM Conference aims to bring together researchers and practitioners working on theory, techniques, and applications that concern analysis and/or manipulation of the source code of software systems. The term “source code” refers to any fully executable description of a software system, such as machine code, (very) high-level languages, and executable graphical representations of systems. The term “analysis” refers to any (semi)automated procedure that yields insight into source code, while “manipulation” refers to any automated or semi-automated procedure that takes and returns source code. While much attention in the wider software engineering community is directed towards other aspects of systems development and evolution, such as specification, design, and requirements engineering, it is the source code that contains the only precise description of the behavior of a system. Hence, the analysis and manipulation of source code remains a pressing concern for which SCAM 2022 solicited high-quality paper submissions.
The SCAM 2022 conference received a total of 73 submissions. There were 45 submissions to the main research track, of which one was desk rejected for violation of the double-blind policy. The remaining 44 submissions went through a thorough review process. Every paper was fully reviewed by three or more program committee members for relevance, soundness and originality and discussed before final decisions were made. The program committee decided to accept 17 papers (acceptance rate 39%). The Engineering track has received 11 submissions, desk rejected one and accepted 5, the NIER track has received 16 submissions and accepted 10, and finally, the RENE track has received 4 submissions and accepted 1.
A public open call was published to invite outstanding papers by other authors on source code analysis and manipulation. In total, 10 papers were submitted to this special issue. Each of the submissions was reviewed by a minimum of three expert referees. Following the first round of review, the authors were asked to revise their papers in response to the referees' comments, and the revised drafts were then reviewed for conformance to the referees' comments. Among those, only five papers were selected for publication in this special issue. The selected papers represent some of the very best work that has appeared at SCAM and cover all of its main areas of interest, namely, refactoring by Yang Zhang and Shuai Hong and by Richárd Szalay and Zoltán Porkoláb; design pattern detection by Hugo Andrade, João Bispo and Filipe F. Correia; string analysis by Luca Negrini, Vincenzo Arceri, Agostino Cortesi and Pietro Ferrara; and regression testing by Francesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, and Luigi Libero Lucio Starace.
In the first paper “ReInstancer: An Automatic Refactoring Approach for Instanceof Pattern Matching”, Zhang et al. present ReInstancer, a tool for automating the refactoring of instanceof pattern matching by optimizing multibranch statements into switch expressions, improving code quality and readability. It demonstrated effectiveness by refactoring over 7700 instances across 20 real-world projects.
The paper by Szalay et al. “Refactoring to Standard C++20 Modules” presents a semi-automatic method for modularizing existing C++ projects using dependency analysis and clustering to organize elements into modules. The study reveals that upgrading to C++20 Modules is constrained by the project's existing architectural design.
In the third paper “Multi-Language Detection of Design Pattern Instances”, Andrade et al. present DP-LARA which is a multilanguage pattern detection tool that leverages the LARA framework's virtual Abstract Syntax Tree (AST) to identify design patterns across object-oriented programming languages. It enables language-agnostic code analysis for improved software comprehension.
The paper by Negrini et al. “Tarsis: an effective automata-based abstract domain for string analysis” presents a novel abstract domain for string values based on finite state automata that outperforms the baseline for string analysis, a typical task on source code analysis.
In the last paper “Regression Test Prioritization Leveraging Source Code Similarity with Tree Kernels”, Altiero et al. introduce two novel Regression Test Prioritization (RTP) techniques that apply Tree Kernels to Abstract Syntax Trees of source code to measure structural changes and prioritize tests accordingly. Evaluated across five Java projects, the proposed methods achieve superior fault detection rates compared to traditional RTP approaches.
We hope you find these papers engaging and encourage those interested to join us at future SCAM conferences.
本期《软件杂志:进化与过程》关注的是软件工程的基础——源代码本身。虽然许多软件工程社区适当地强调规范、设计和需求工程等方面,但源代码提供了对系统行为的唯一精确描述。因此,源代码的分析和操作仍然是关键问题。除其他外,本期包含2022年10月在塞浦路斯利马索尔举行的第22届IEEE源代码分析与操纵国际工作会议(SCAM 2022)上发表的最佳论文的扩展版本。骗局会议的目的是汇集研究人员和实践者的理论,技术和应用,有关分析和/或操作的软件系统的源代码。术语“源代码”指的是软件系统的任何完全可执行的描述,例如机器码、(非常)高级语言和系统的可执行图形表示。术语“分析”指的是能够深入了解源代码的任何(半)自动化过程,而“操作”指的是获取并返回源代码的任何自动化或半自动化过程。虽然在更广泛的软件工程社区中,许多注意力都集中在系统开发和进化的其他方面,例如规格说明、设计和需求工程,但是源代码包含了对系统行为的唯一精确描述。因此,对源代码的分析和操作仍然是一个迫切需要关注的问题,因此2022年将征集高质量的论文提交。2022年会议共收到73份提案。有45份提交给主要研究轨道,其中一份因违反双盲政策而被拒绝。其余44份意见书经过了彻底的审查程序。每篇论文都由三个或更多的项目委员会成员全面审查,以确定其相关性、可靠性和原创性,并在做出最终决定之前进行讨论。计划委员会决定录用17篇论文(录取率39%)。工程轨道收到11份投稿,办公桌拒绝1份,接受5份;NIER轨道收到16份投稿,接受10份;最后,RENE轨道收到4份投稿,接受1份。公开邀请其他作者发表关于源代码分析和操作的优秀论文。本期特刊共收到10篇论文。每一份提交的作品都由至少三名专家评审。在第一轮评审之后,作者被要求根据审稿人的意见修改他们的论文,然后审查修改后的草稿是否符合审稿人的意见。其中只有5篇论文入选本期特刊。所选的论文代表了在SCAM上出现的一些最好的工作,涵盖了所有主要的兴趣领域,即重构由Yang Zhang和Shuai Hong以及Richárd Szalay和Zoltán Porkoláb;Hugo Andrade、jo<e:1> o Bispo和Filipe F. Correia的设计模式检测;Luca Negrini, Vincenzo Arceri, Agostino Cortesi和Pietro Ferrara的字符串分析;以及Francesco Altiero、Anna Corazza、Sergio Di Martino、Adriano Peron和Luigi Libero Lucio Starace的回归测试。在第一篇论文“ReInstancer: a Automatic Refactoring Approach for Instanceof Pattern Matching”中,Zhang等人介绍了ReInstancer,这是一种通过将多分支语句优化为switch表达式来实现Instanceof模式匹配自动化重构的工具,提高了代码质量和可读性。它通过重构20个实际项目中的7700多个实例证明了其有效性。Szalay等人的论文《Refactoring to Standard c++ 20 Modules》提出了一种半自动的方法,通过依赖分析和聚类将元素组织成模块,将现有的c++项目模块化。研究表明,升级到c++ 20模块受到项目现有架构设计的限制。在第三篇论文“设计模式实例的多语言检测”中,Andrade等人提出了DP-LARA,这是一种多语言模式检测工具,它利用LARA框架的虚拟抽象语法树(AST)来识别跨面向对象编程语言的设计模式。它支持与语言无关的代码分析,以提高软件的理解能力。Negrini等人的论文“Tarsis:一个有效的基于自动机的字符串分析抽象域”提出了一个基于有限状态自动机的字符串值抽象域,它优于字符串分析的基线,这是源代码分析的典型任务。Altiero等人在上一篇论文“利用源代码与树核相似度的回归测试优先级”中。 介绍两种新的回归测试优先级(RTP)技术,它们将树核应用于源代码的抽象语法树,以测量结构变化并相应地对测试进行优先级排序。通过对五个Java项目的评估,与传统的RTP方法相比,所提出的方法实现了更高的故障检测率。我们希望您发现这些论文引人入胜,并鼓励那些有兴趣的人加入我们在未来的骗局会议。