Yang Hong, C. Tantithamthavorn, Patanamon Thongtanunam
{"title":"Where Should I Look at? Recommending Lines that Reviewers Should Pay Attention To","authors":"Yang Hong, C. Tantithamthavorn, Patanamon Thongtanunam","doi":"10.1109/saner53432.2022.00121","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00121","url":null,"abstract":"Code review is an effective quality assurance practice, yet can be time-consuming since reviewers have to carefully review all new added lines in a patch. Our analysis shows that at the median, patch authors often waited 15–64 hours to receive initial feedback from reviewers, which accounts for 16%-26% of the whole review time of a patch. Importantly, we also found that large patches tend to receive initial feedback from reviewers slower than smaller patches. Hence, it would be beneficial to reviewers to reduce their effort with an approach to pinpoint the lines that they should pay attention to. In this paper, we proposed REVSPOT-a machine learning-based approach to predict problematic lines (i.e., lines that will receive a comment and lines that will be revised). Through a case study of three open-source projects (i.e., Openstack Nova, Openstack Ironic, and Qt Base), Revspot can accurately predict lines that will receive comments and will be revised (with a Top-10 Accuracy of 81% and 93%, which is 56% and 15% better than the baseline approach), and these correctly predicted problematic lines are related to logic defects, which could impact the functionality of the system. Based on these findings, our Revspot could help reviewers to reduce their reviewing effort by reviewing a smaller set of lines and increasing code review speed and reviewers' productivity.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Type Profiling to the Rescue: Test Amplification in Python and Smalltalk","authors":"S. Demeyer, Mehrdad Abdi, Ebert Schoofs","doi":"10.1109/saner53432.2022.00136","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00136","url":null,"abstract":"Software test amplification is the act of strength-ening manually written test-cases to exercise the boundary conditions of the system under test. It has been demonstrated by the research community to work for the programming language Java, relying on the static type system to safely transform the code under test. In dynamically typed languages, such type decla-rations are not available, and as a consequence test amplification has yet to find its way to programming languages like Smalltalk, Python, Ruby and Javascript. The AnSyMo research group has created two proof of concept tools for languages without a static type system: AmPyfier (for Python) and Small-Amp (for Pharo-Smalltalk). In this tool demonstration paper we explain how we relied on profiling libraries present in the respective eco-systems to infer the necessary type information for enabling full-blown test amplification.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131931911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaiyong Ragkhitwetsagul, J. Krinke, Morakot Choetkiertikul, T. Sunetnanta, Federica Sarro
{"title":"Identifying Software Engineering Challenges in Software SMEs: A Case Study in Thailand","authors":"Chaiyong Ragkhitwetsagul, J. Krinke, Morakot Choetkiertikul, T. Sunetnanta, Federica Sarro","doi":"10.1109/saner53432.2022.00036","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00036","url":null,"abstract":"Small and medium-sized software enterprises (SSMEs) are a vital part of emerging markets. Due to their size, they are not capable of adopting advanced software engineering techniques or automated software engineering tools in the same way large and ultra-large companies are. We study the software engineering challenges in SSMEs in Thailand, an emerging market in software development, using semi-structured interviews with four SSMEs. After performing a thematic analysis of the interview transcripts, we found a number of common challenges such as lack of testing, code-related issues, and inaccurate effort estimation. We observed that in order to introduce advanced automated software engineering tools and techniques, SSMEs need to adopt contemporary best practices in software engineering like automated testing, continuous integration and automated code review. Moreover, we suggest that software engineering research engage with SSMEs to enable them to improve their knowledge and adopt more advanced software engineering practices.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"19 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Siow, Shangqing Liu, Xiaofei Xie, Guozhu Meng, Yang Liu
{"title":"Learning Program Semantics with Code Representations: An Empirical Study","authors":"J. Siow, Shangqing Liu, Xiaofei Xie, Guozhu Meng, Yang Liu","doi":"10.48550/arXiv.2203.11790","DOIUrl":"https://doi.org/10.48550/arXiv.2203.11790","url":null,"abstract":"Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., Code Classification, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three research questions (RQs) and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129524664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giang Nguyen-Truong, Hong Jin Kang, D. Lo, Abhishek Sharma, A. Santosa, Asankhaya Sharma, Ming Yi Ang
{"title":"HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits","authors":"Giang Nguyen-Truong, Hong Jin Kang, D. Lo, Abhishek Sharma, A. Santosa, Asankhaya Sharma, Ming Yi Ang","doi":"10.1109/saner53432.2022.00018","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00018","url":null,"abstract":"Software projects today rely on many third-party libraries, and therefore, are exposed to vulnerabilities in these libraries. When a library vulnerability is fixed, users are notified and advised to upgrade to a new version of the library. However, not all vulnerabilities are publicly disclosed, and users may not be aware of vulnerabilities that may affect their applications. Due to the above challenges, there is a need for techniques which can identify and alert users to silent fixes in libraries; commits that fix bugs with security implications that are not officially disclosed. We propose a machine learning approach to automatically identify vulnerability-fixing commits. Existing techniques consider only data within a commit, such as its commit message, which does not always have sufficiently discriminative information. To address this limitation, our approach incorporates the rich source of information from issue trackers. When a commit does not link to an issue, we use a commit-issue link recovery technique to infer the potential missing link. Our experiments are promising; incorporating information from issue trackers boosts the performance of a vulnerability-fixing commit classifier, improving over the strongest baseline by 11.1% on the entire dataset, which includes commits that do not link to an issue. On a subset of the data in which all commits explicitly link to an issue, our approach improves over the baseline by 12.5%.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130622084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thierry Sorg, Amine Abbad Andaloussi, Barbara Weber
{"title":"Towards a Fine-grained Analysis of Cognitive Load During Program Comprehension","authors":"Thierry Sorg, Amine Abbad Andaloussi, Barbara Weber","doi":"10.1109/saner53432.2022.00092","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00092","url":null,"abstract":"Program comprehension is inherent to all software development activities. This task may require a high mental effort (or so-called “cognitive load”), which in turn can hinder the performance of developers. In the literature, several authors have investigated the ability of biosignals to estimate developers' cognitive load during program comprehension. While the majority of these studies provide estimates at the task level, we aim for a more fine-grained level of analysis allowing to pinpoint the critical parts of code that could be associated with cognitive load. We infer these critical parts solely from eye fixation features and investigate qualitatively their relationship with those perceived as challenging by users. Being able to pinpoint critical parts in the source-code, is a first stride towards a very handy approach providing targeted support to developers to prevent them from committing errors. Furthermore, such a lightweight approach can be adapted in online settings.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133973777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain-Oriented Software Variant Forks: A Preliminary Study","authors":"Henrique Rocha, John Businge","doi":"10.48550/arXiv.2204.11083","DOIUrl":"https://doi.org/10.48550/arXiv.2204.11083","url":null,"abstract":"In collaborative social development platforms such as GitHub, forking a repository is a common activity. A variant fork wants to split the development from the original repository and grow towards a different direction. In this preliminary exploratory research, we analyze the possible reasons for creating a variant fork in blockchain-oriented software. By collecting repositories in GitHub, we created a dataset with repositories and their variants, from which we manually analyzed 86 variants. Based on the variants we studied, the main reason to create a variant in blockchain-oriented software is to support a different blockchain platform (65%).","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131258373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Nguyen, Francesco Lomio, Fabiano Pecorelli, Valentina Lenarduzzi
{"title":"PANDORA: Continuous Mining Software Repository and Dataset Generation","authors":"H. Nguyen, Francesco Lomio, Fabiano Pecorelli, Valentina Lenarduzzi","doi":"10.1109/saner53432.2022.00041","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00041","url":null,"abstract":"During the mining software repository activities, a huge amount of data gathered from different sources is analyzed. Different tools have been developed for collecting and aggregating data from repositories, but they do not easily allow researchers to develop new extractors, to integrate the data collected from other platforms, and in particular from platforms that delete the data periodically. Moreover, mining software repository studies are commonly performed on old versions of software projects and their results are not commonly periodically updated. As a result of the non-continuously updated studies, practitioners often do not trust results from empirical studies. In order to overcome the aforementioned issues, in this paper, we present Pandora, a tool that automatically and continuously mines data from different existing tools and online platforms and enables to run and continuously update the results of mining software repository studies. To evaluate the applicability of our tool, we currently analyzed 365 projects (developed in different languages), continuously collecting data from December 2020 to May 2021 and running an example study, investigating the build-stability of SonarQube rules. Link to dashboard: http://sqa.rd.tuni.fi/superset/dashboard/1 Link to source code: https://github.com/clowee/PANDORA Link to 5-minutes video: https://youtu.be/CuVO9YGJ59I","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129602693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Attribute Grammar Mining by Symbolic Execution","authors":"M. Moser, J. Pichler, A. Pointner","doi":"10.1109/saner53432.2022.00100","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00100","url":null,"abstract":"The specification of program inputs is a requirement for many software engineering tasks, but often does not exist or is out of date. To tackle this problem, software engineers may apply program analysis techniques to extract parts of a specification from the source code that processes the program input. Today there are analysis techniques for the extraction of constraints (mathematical formulas) for individual program inputs (e.g. function parameters) as well as emerging techniques for inferring context-free grammars that specify the syntax of program input strings. However, such techniques focus on a single aspect (e.g., constraints or grammars) of the specification only and neglect the other one. We propose to integrate such analysis techniques by extending existing approaches for mining input grammars with the extraction of constraints. Constraints are integrated with a grammar in the form of attributes and context constraints on grammar symbols, resulting in an attribute grammar as specification format. To achieve this goal, we choose the analysis method dynamic symbolic execution (DSE), which is already an established technique for the extraction of constraints and beneficial for grammar mining (e.g., through automatic input generation) as well. Thus, DSE not only covers both aspects but also—as a single analysis method—should facilitate the integration of these two aspects. In this paper, we describe the basic idea of the proposed integration and report the first results on DSE-based grammar extraction.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"46 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113969395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunosuke Higashi, Katsunori Fukui, Yutaro Kashiwa, M. Ohira
{"title":"A Preliminary Analysis of GPL-Related License Violations in Docker Images","authors":"Yunosuke Higashi, Katsunori Fukui, Yutaro Kashiwa, M. Ohira","doi":"10.1109/saner53432.2022.00059","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00059","url":null,"abstract":"Background: In recent years, the use of container virtualization technology has been rapidly spreading to speed up software release and operation. In general, a containerized application image (e.g., Docker image) consists of multiple reused OSS packages. To reuse OSS, it is necessary to comply with the OSS licenses. Although there have been many studies on OSS license detection and license compatibility among OSS packages, but to the best of our knowledge, there is no study tackled with incompatible license problems among OSS packages in a container image. Aims: In this paper, we conduct a preliminary analysis to clarify the extent to which Docker images contain OSS license incompatibility problems. Method: We analyze 776 Docker images published on GitHub to determine whether license incompatibilities among OSS packages exist. Results: The analysis showed that a total of 2,167 software packages were used in the 776 Docker images. The majority of the software packages (71.3%) are compatible with the GPL family, but a non-negligible number of software packages (28.7%) are not compatible. The analysis also showed that 457 (58.9%) of the 776 images had GPL-related incompatibility problems. Conclusions: Unlike traditional software development, in which software packages to be reused are explicitly combined, Dockerfile creators who build and distribute Docker images might be less aware of the risks related to compatibility between OSS licenses. Our results are useful as information to improve the awareness of Dockerfile creators, and also indicates the necessity of future studies to detect and prevent the inclusion of license-incompatible OSS packages to container images.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116076226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}