{"title":"Phrase2Set: Phrase-to-Set Machine Translation and Its Software Engineering Applications","authors":"THANH VAN NGUYEN, Aashish Yadavally, T. Nguyen","doi":"10.1109/saner53432.2022.00068","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00068","url":null,"abstract":"Machine translation has been applied to software engineering (SE) problems, e.g., software tagging, language mi-gration, bug localization, auto program repair, etc. However, ma-chine translation primarily supports only sequence-to-sequence transformations and falls short during the translation/transfor-mation from a phrase or sequence in the input to a set in the output. An example of such a task is tagging the input text in a software library tutorial or a forum entry text with a set of API elements that are relevant to the input. In this work, we propose Phrase2Set, a context-sensitive statistical machine translation model that learns to transform a phrase of a mixture of code and texts into a set of code or text tokens. We first design a token-to-token algorithm that computes the probabilities of mapping individual tokens from phrases to sets. We propose a Bayesian network-based statistical machine translation model that uses these probabilities to decide a trans-lation process that maximizes the joint translation probability. To achieve that, we consider the context of the tokens in the source side and that in the target side via their relative co-occurrence frequencies. We evaluate Phrase2Set in three SE applications: 1) tagging the fragments of texts in a tutorial with the relevant API elements, 2) tagging the StackOverflow entries with relevant API elements, 3) text-to-API translation. Our empirical results show that Phrase2Set achieves high accuracy and outperforms the state-of-the-art models in all three applications. We also provide the lessons learned and other potential applications.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards the Isolation of Failure-Inducing Inputs in Cyber-Physical Systems: is Delta Debugging Enough?","authors":"P. Valle, Aitor Arrieta","doi":"10.1109/saner53432.2022.00072","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00072","url":null,"abstract":"Cyber-Physical Systems (CPSs) combine digital cyber technologies with parallel physical processes. On the one hand, verification methods of such systems mostly rely on (system level) simulation-based testing. This technique is expensive because complex mathematical models are used to model the physical part of CPSs. On the other hand, test cases of CPSs are usually formed by long test inputs that aim at mimicking real-world scenarios. As a result, when a failure is exhibited, it is highly important to isolate the failure-inducing inputs to provide the developers with the minimal test input. This allows reducing debugging costs by (1) reproducing the failure in the minimal time and (2) reducing the test coverage of the system, making the fault localization easier. In this paper we adapt the well-known delta debugging algorithm to isolate the failure-inducing inputs of CPSs modeled in Simulink. By means of three Simulink models, we analyzed whether Delta Debugging is effective enough to isolate failure-inducing inputs in CPSs.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommending Code Reviewers for Proprietary Software Projects: A Large Scale Study","authors":"Dezhen Kong, Qiuyuan Chen, Lingfeng Bao, Chenxing Sun, Xin Xia, Shanping Li","doi":"10.1109/saner53432.2022.00080","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00080","url":null,"abstract":"Code review is an important activity in software development, which offers benefits such as improving code quality, reducing defects and distributing knowledge. Tencent, as a giant company, hosts a great number of proprietary software projects that are only open to specific internal developers. Since these proprietary projects receive up to 100,000 of newly submitted code changes per month, it is extremely needed to automatically recommend code reviewers. To this end, we first conduct an empirical study on a large scale of proprietary projects from Tencent, to understand their characteristics and how code reviewer recommendation approaches work on them. Based on the derived findings and implications, we propose a new approach named Camp that recommends reviewers by considering their collaboration and expertise in multiple projects, to fit the context of proprietary software development. The evaluation results show that Camp can achieve higher scores on proprietary projects across most metrics than other state-of-the-art approaches, i.e., Revfinder, CHREV, Tie and Comment Network and produce acceptable performance scores for more projects. In addition, we discuss the possible directions of code reviewer recommendation.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121235050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Ravelo-Méndez, Camilo Escobar-Velásquez, M. Linares-Vásquez
{"title":"Kraken 2.0: A platform-agnostic and cross-device interaction testing tool","authors":"William Ravelo-Méndez, Camilo Escobar-Velásquez, M. Linares-Vásquez","doi":"10.1109/saner53432.2022.00102","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00102","url":null,"abstract":"Mobile devices and apps have a primordial role in daily life, and both have supported daily activities that involve humans interaction. Nevertheless, this interaction can occur between users in different platforms (e.g., web and mobile) and devices. Because of this, developers are required to test combinations of heterogeneous interactions to ensure a correct behavior of multi-device and multi-platform apps. Unfortunately, to the best of our knowledge, there is no existing open source tool that enables testing for those cases. In this paper, we present an improved version of our tool KrakenMobile, an open source tool that enables the execution of interactive End-2-End tests between Android devices. This new version, Kraken 2.0, has new capabilities such as execution of platform-agnostic interactive End-2-End tests (e.g., web and mobile), and has been migrated from Ruby to NodeJS to improve its usability. Kraken2.0 is publicly available on GitHub (https://bit.ly/30KPFcv). Videos: https://bit.ly/3f1fRXa","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128819498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Moran, Ali Yachnes, George Purnell, Juanyed Mahmud, Michele Tufano, Carlos Bernal Cardenas, D. Poshyvanyk, Zach H’Doubler
{"title":"An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation","authors":"Kevin Moran, Ali Yachnes, George Purnell, Juanyed Mahmud, Michele Tufano, Carlos Bernal Cardenas, D. Poshyvanyk, Zach H’Doubler","doi":"10.1109/SANER53432.2022.00069","DOIUrl":"https://doi.org/10.1109/SANER53432.2022.00069","url":null,"abstract":"Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116773055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihyeon Kim, Dae-hyeon Jeong, Jisoo Kim, Eun-Sun Cho
{"title":"Stone: A Privacy Policy Enforcement System for Smart Contracts","authors":"Jihyeon Kim, Dae-hyeon Jeong, Jisoo Kim, Eun-Sun Cho","doi":"10.1109/saner53432.2022.00141","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00141","url":null,"abstract":"Smart contracts running on blockchain potentially disclose all data to the participants of the chain. Therefore, because privacy is important in many areas, smart contracts may not be considered a good option. To overcome this limitation, this paper introduces Stone, a privacy preservation system for smart contracts. With Stone, an arbitrary Solidity smart contract can be combined with a separate privacy policy in JSON, which prevents the storage data in the contract from being publicised. Because this approach is convenient for policy developers as well as smart contract programmers, we envision that this approach will be practically acceptable for real-world applications.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romulo Nascimento, André C. Hora, Eduardo Figueiredo
{"title":"Exploring API Deprecation Evolution in JavaScript","authors":"Romulo Nascimento, André C. Hora, Eduardo Figueiredo","doi":"10.1109/saner53432.2022.00031","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00031","url":null,"abstract":"Building an application using third-party libraries is a common practice in software development. As any other system, software libraries and their APIs evolve. To support version migration and ensure backward compatibility, a recommended practice during development is to deprecate API. Different from other popular programming languages such as Java and C#, JavaScript has no native support to deprecate API elements. However, several strategies are commonly adopted to communicate that an API should be avoided, such as the project documentation, JSDoc annotation, code comment, console message, and deprecation utility. Indeed, there have been many studies on deprecation strategies and evolution mostly on Java, C#, and Python. However, to the best of our knowledge, there are no detailed studies aiming at analyzing how API deprecation changes over time in the JavaScript ecosystem. This paper provides an empirical study on how API deprecation evolves in JavaScript by analyzing 1,918 releases of 50 popular packages. Results show that close to 60% have rising trends in the number of deprecated APIs, while only 9.4% indicate a downward trend. Also, most deprecation occurrences are both added and removed on minor releases instead of removed on major releases, as recommended by best practices.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116053591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao
{"title":"DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy","authors":"Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao","doi":"10.1109/saner53432.2022.00025","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00025","url":null,"abstract":"Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114836241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilo Velázquez-Rodríguez, Eleni Constantinou, Coen De Roover
{"title":"Uncovering Library Features from API Usage on Stack Overflow","authors":"Camilo Velázquez-Rodríguez, Eleni Constantinou, Coen De Roover","doi":"10.1109/saner53432.2022.00035","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00035","url":null,"abstract":"Selecting an appropriate library for reuse within a vast software ecosystem can be a daunting task. A list of features for each library, i.e., a short description of the functionality that can be reused with code examples that illustrate its usage, may alleviate this problem. In this paper, we propose a data-driven approach that uses both the code snippets and the accompanying natural language descriptions from Stack Overflow posts to produce a list of features of a given library. Each extracted feature corresponds to a cluster of API classes and methods considered related based on attributes of the Stack Overflow posts in which they appear. We evaluated the approach considering seven Maven libraries and compared the resulting features against library descriptions from cookbook-like tutorials. The approach achieves an average accuracy of 67% across the seven libraries for the tutorial-like features. For at least 73% of the features extracted by the approach but missing from the documentation, we found a matching library usage in a corpus of GitHub projects. These results suggest that our clusters represent library features, which paves the way to better tool support for documenting software libraries and for selecting a library in an ecosystem.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127442313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nyyti Saarimäki, Sergio Moreschini, Francesco Lomio, R. Peñaloza, Valentina Lenarduzzi
{"title":"Towards a Robust Approach to Analyze Time-Dependent Data in Software Engineering","authors":"Nyyti Saarimäki, Sergio Moreschini, Francesco Lomio, R. Peñaloza, Valentina Lenarduzzi","doi":"10.1109/saner53432.2022.00015","DOIUrl":"https://doi.org/10.1109/saner53432.2022.00015","url":null,"abstract":"Background. Several recent software engineering studies use data mined from the version control systems adopted by the different software projects. However, inspecting the data and statistical methods used in those studies reveals several problems with the current approach, mainly related to the dependent nature of the data. Objective. We analyzed time-dependent data in software engineering at commit level, and propose an alternative approach based on time series analysis. Method. We identified statistical tests designed for time series analysis and propose a technique to model time dependent data, similarly to what is done in finance and weather forecasting. We applied our approach to a small set of projects of different sizes, investigating the behaviour of the SQALE Index, in order to highlight the time and interdependency of the different commits. Results. Using these techniques, we analysed and model the data, showing that it is possible to investigate this type of commit data using methods from time series analysis. Conclusion. Based on the promising results, we plan to validate the robustness of the approach by replicating previous works.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}