Daniel Coutinho, Anderson G. Uchôa, Caio Barbosa, Vinícius Soares, Alessandro F. Garcia, Marcelo Schots, J. Pereira, W. K. Assunção
{"title":"On the Influential Interactive Factors on Degrees of Design Decay: A Multi-Project Study","authors":"Daniel Coutinho, Anderson G. Uchôa, Caio Barbosa, Vinícius Soares, Alessandro F. Garcia, Marcelo Schots, J. Pereira, W. K. Assunção","doi":"10.1109/saner53432.2022.00093","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00093","url":null,"abstract":"Developers constantly perform code changes throughout the lifetime of a project. These changes may induce the introduction of design problems (design decay) over time, which may be reduced or accelerated by interacting with different factors (e.g., refactorings) that underlie each change. However, existing studies lack evidence about how these factors interact and influence design decay. Thus, this paper reports a study aimed at investigating whether and how (associations of) process and developer factors influence design decay. We studied seven software systems, containing an average of 45K commits in more than six years of project history. Design decay was characterized in terms of five internal quality attributes: cohesion, coupling, complexity, inheritance, and size. We observed and characterized 12 (sub-)factors and how they associate with design decay. To this end, we employed association rule mining. Moreover, we also differentiate between the associations found on modules with varying levels of decay. Process- and developer-related factors played a key role in discriminating these different levels of design decay. Then, we focused on analyzing the effects of potentially interacting factors regarding slightly- and largely-decayed modules. Finally, we observed diverging decay patterns in these modules. For example, individually, the developer-related sub-factor that represented first-time contributors, as well as the process-related one that represented the size of a change did not have negative effects on the changed classes. However, when analyzing specific factor interactions, we saw that changes in which both of these factors interacted tended to have a negative effect on the code, leading to decay.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127528996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rebot: An Automatic Multi-modal Requirements Review Bot","authors":"Ming Ye, Jicheng Cao, Shengyu Cheng","doi":"10.1109/saner53432.2022.00095","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00095","url":null,"abstract":"Requirements review is the process that reviewers read documents, make suggestions, and help improve the quality of requirements, which is a major factor that contributes to the success or failure of software. However, manually reviewing is a time-consuming and challenging task that requires high domain knowledge and expertise. To address the problem, we developed a requirements review tool, called Rebot, which automates the requirements parsing, quality classification, and suggestions generation. The core of Rebot is a neural network-based quality model which fuses multi-modal information (visual and textual information) of requirements documents to classify their quality levels (high, medium, low). The model is trained and evaluated on a real industrial requirements documents dataset which is collected from ZTE corporation. The experiments show the model achieves 81.3% accuracy in classifying the quality into three levels. To further validate Rebot, we deployed it in a live software development project. We evaluated the correctness, usefulness, and feasibility of Rebot by conducting a questionnaire with the users. Around 76.5% of Rebot's users believe Rebot can support requirements review by providing reliable quality classification results with revision suggestions. Furthermore, Around 88% of the users believe Rebot helps reduce the workload of reviewers and increase the development efficiency.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127007359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Yang, Yibiao Yang, Maolin Sun, Ming Wen, Yuming Zhou, Hai Jin
{"title":"Isolating Compiler Optimization Faults via Differentiating Finer-grained Options","authors":"Jing Yang, Yibiao Yang, Maolin Sun, Ming Wen, Yuming Zhou, Hai Jin","doi":"10.1109/saner53432.2022.00065","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00065","url":null,"abstract":"Code optimization is an essential feature for compilers and almost all software products are released by compiler optimizations. Consequently, bugs in code optimization will inevitably cast significant impact on the correctness of software systems. Locating optimization bugs in compilers is challenging as compilers typically support a large amount of optimization configurations. Although prior studies have proposed to locate compiler bugs via generating witness test programs, they are still time-consuming and not effective enough. To address such limitations, we propose an automatic bug localization approach, ODFL, for locating compiler optimization bugs via differentiating finer-grained options in this study. Specifically, we first disable the fine-grained options that are enabled by default under the bug-triggering optimization levels independently to obtain bug-free and bug-related fine-grained options. We then configure several effective passing and failing optimization sequences based on such fine-grained options to obtain multiple failing and passing compiler coverage. Finally, such generated coverage information can be utilized via Spectrum-Based Fault Localization formulae to rank the suspicious compiler files. We run ODFL on 60 buggy GCC compilers from an existing benchmark. The experimental results show that ODFL significantly outperforms the state-of-the-art compiler bug isolation approach RecBi in terms of all the evaluated metrics, demonstrating the effectiveness of ODFL. In addition, ODFL is much more efficient than RecBi as it can save more than 88% of the time for locating bugs on average.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127467528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How and Why Developers Migrate Python Tests","authors":"Livia Barbosa, André C. Hora","doi":"10.1109/saner53432.2022.00071","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00071","url":null,"abstract":"Nowadays, Python developers can rely on two major testing frameworks: unittest and pytest. Due to the benefits of pytest (e.g., fixture reuse), several relevant projects in the Python ecosystem have migrated from unittest to pytest. Despite being performed by the Python community, we are not yet aware of how systems are migrated from unittest to pytest nor the major reasons behind the migration. In this paper, we provide the first empirical study to assess testing framework migration. We analyze how and why developers migrate from unittest to pytest. We mine 100 popular Python systems and assess their migration status. We find that 34% of the systems rely on both testing frameworks and that Python projects are moving to pytest. While some systems have fully migrated, others are still migrating after a long period, suggesting that the migration is not always straightforward. Overall, the migrated test code is smaller than the original one. Furthermore, developers migrate to pytest due to several reasons, such as the easier syntax, interoperability, easier maintenance, and fixture flexibility/reuse, however, the implicit mechanics of pytest is a concern. We conclude by discussing practical implications for practitioners and researchers.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129994570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LoGenText: Automatically Generating Logging Texts Using Neural Machine Translation","authors":"Zishuo Ding, Heng Li, Weiyi Shang","doi":"10.1109/saner53432.2022.00051","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00051","url":null,"abstract":"The textual descriptions in logging statements (i.e., logging texts) are printed during system executions and exposed to multiple stakeholders including developers, operators, users, and regulatory authorities. Writing proper logging texts is an important but often challenging task for developers. However, despite extensive research on automated logging suggestions, research on suggesting logging texts rarely exists. In this paper, we present LoGenText, an automated approach that generates logging texts by translating the related source code into short textual descriptions. LoGenText takes the preceding source code of a logging text as the input and considers other context information such as the location of the logging statement, to automatically generate the logging text using neural machine translation models. We evaluate LoGenText on 10 open-source projects, and compare the automatically generated logging texts with the developer-inserted logging texts in the source code. We find that LoGenText generates logging texts that achieve BLEU scores of 23.3 to 41.8 and ROUGE-L scores of 42.1 to 53.9, which outperforms the state-of-the-art approach by a large margin. In addition, we perform a human evaluation involving 42 participants, which further demonstrates the quality of the logging texts generated by LoGenText. Our work is an important step towards automated generation of logging statements, which can potentially save developers' efforts and improve the quality of software logging.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Merlo, Mathieu Margier, Guy-Vincent Jourdan, Iosif-Viorel Onut
{"title":"Phishing Kits Source Code Similarity Distribution: A Case Study","authors":"E. Merlo, Mathieu Margier, Guy-Vincent Jourdan, Iosif-Viorel Onut","doi":"10.1109/saner53432.2022.00116","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00116","url":null,"abstract":"Attackers (“phishers”) typically deploy source code in some host website to impersonate a brand or in general a situation in which a user is expected to provide some personal information of interest to phishers (e.g. credentials, credit card number). Phishing kits are ready-to-deploy sets of files that can be simply copied on a web server and used almost as they are. In this paper, we consider the static similarity analysis of the source code of 20871 phishing kits totalling over 182 million lines of PHP, Javascript and HTML code, that have been collected during phishing attacks and recovered by forensics teams. Reported experimental results show that as much as 90% of the analyzed kits share 90% or more of their source code with at least another kit. Differences are small, less than about 1000 programming words – identifiers, constants, strings and so on – in 40% of cases. A plausible lineage of phishing kits is presented by connecting together kits with the highest similarity. Obtained results show a very different reconstructed lineage for phishing kits when compared to a publicly available application such as Wordpress. Observed kits similarity distribution is consistent with the assumed hypothesis that kit propagation is often based on identical or near-identical copies at low cost changes. The proposed approach may help classifying new incoming phishing kits as “near-copy” or “intellectual leaps” from known and already encountered kits. This could facilitate the identification and classification of new kits as derived from older known kits.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132510606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Vulnerabilities from GitHub Commits","authors":"N. Chan, J. Chandy","doi":"10.1109/saner53432.2022.00038","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00038","url":null,"abstract":"Open-source libraries save developers time and effort by providing them access to pre-written functions, objects, and methods. The adoption of such libraries follows the current trend of more widespread use of open-source software and components. However, like proprietary software, open-source software can also suffer from defects that can be exploited by attackers. Many of these vulnerabilities have been identified and documented and are stored in Common Vulnerabilities and Exposures (CVE) databases maintained by entities such as NIST. Developers of these open-source components have a responsibility to inform their users of the vulnerabilities that exist in their releases and of the patches that fix these vulnerabilities. Consistent documentation of CVEs is a prerequisite for mitigating these vulnerabilities, especially if an automated approach is taken. This study investigates how well-documented are the patches both in the CVE database, and within the Github commits of C language open-source libraries. The results show that a significant number of CVEs in the NIST database do not mention the existence of patches and that only a small subset of the libraries looked at document CVEs in their commits. This paper comes to the conclusion that mutually agreed upon standards when it comes to CVE documentation should be adopted by both developers of open-source software and the entities that update and maintain CVE databases.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123375762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Relevant Artifacts of Release Notes: The Practitioners' Perspective","authors":"Sristy Sumana Nath, B. Roy","doi":"10.48550/arXiv.2204.05355","DOIUrl":"https://doi.org/10.48550/arXiv.2204.05355","url":null,"abstract":"A software release note is one of the essential documents in the software development life cycle. The software release contains a set of information, e.g., bug fixes and security fixes. Release notes are used in different phases, e.g., requirement engineering, software testing and release management. Different types of practitioners (e.g., project managers and clients) get benefited from the release notes to understand the overview of the latest release. As a result, several studies have been done about release notes production and usage in practice. However, two significant problems (e.g., duplication and inconsistency in release notes contents) exist in producing well-written & well-structured release notes and organizing appropriate information regarding different targeted users' needs. For that reason, practitioners face difficulties in writing and reading the release notes using existing tools. To mitigate these problems, we execute two different studies in our paper. First, we execute an exploratory study by analyzing 3,347 release notes of 21 GitHub repositories to understand the documented contents of the release notes. As a result, we find relevant key artifacts, e.g., issues (29%), pull-requests (32%), commits (19%), and common vulnerabilities and exposures (CVE) issues (6%) in the release note contents. Second, we conduct a survey study with 32 professionals to understand the key information that is included in release notes regarding users' roles. For example, project managers are more interested in learning about new features than less critical bug fixes. Our study can guide future research directions to help practitioners produce the release notes with relevant content and improve the documentation quality.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123028599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulia Sellitto, Emanuele Iannone, Zadia Codabux, Valentina Lenarduzzi, A. D. Lucia, Fabio Palomba, F. Ferrucci
{"title":"Toward Understanding the Impact of Refactoring on Program Comprehension","authors":"Giulia Sellitto, Emanuele Iannone, Zadia Codabux, Valentina Lenarduzzi, A. D. Lucia, Fabio Palomba, F. Ferrucci","doi":"10.1109/saner53432.2022.00090","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00090","url":null,"abstract":"Software refactoring is the activity associated with developers changing the internal structure of source code without modifying its external behavior. The literature argues that refactoring might have beneficial and harmful implications for software maintainability, primarily when performed without the support of automated tools. This paper continues the narrative on the effects of refactoring by exploring the dimension of program comprehension, namely the property that describes how easy it is for developers to understand source code. We start our investigation by assessing the basic unit of program comprehension, namely program readability. Next, we set up a large-scale empirical investigation – conducted on 156 open-source projects – to quantify the impact of refactoring on program readability. First, we mine refactoring data and, for each commit involving a refactoring, we compute (i) the amount and type(s) of refactoring actions performed and (ii) eight state-of-the-art program comprehension metrics. Afterwards, we build statistical models relating the various refactoring operations to each of the readability metrics considered to quantify the extent to which each refactoring impacts the metrics in either a positive or negative manner. The key results are that refactoring has a notable impact on most of the readability metrics considered.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116732102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nitish Patkar, Andrei Chis, N. Stulova, Oscar Nierstrasz
{"title":"First-class artifacts as building blocks for live in-IDE documentation","authors":"Nitish Patkar, Andrei Chis, N. Stulova, Oscar Nierstrasz","doi":"10.1109/saner53432.2022.00016","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00016","url":null,"abstract":"A traditional round-trip engineering approach based on model transformations does not scale well to modern agile development environments where numerous artifacts are produced using a range of heterogeneous tools and technologies. To boost artifact connectivity and maintain their consistency, we propose to create and manage software-related artifacts as first-class entities directly in an integrated development environment (IDE). This approach has two advantages: (i) compared to employing separate tools, creating various artifacts directly within a development platform eliminates the necessity to recover trace links, and (ii) first-class artifacts can be composed into stakeholder-specific live document-artifacts. We detail and exemplify our approach in the Glamorous Toolkit IDE (henceforth, Glamorous toolkit), and discuss the results of a semi-structured pilot survey we conducted with practitioners and researchers to evaluate its usefulness in practice.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121719879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}