{"title":"Columbo: High perfomance unpacking","authors":"J. Raber","doi":"10.1109/SANER.2017.7884663","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884663","url":null,"abstract":"Columbo is a tool for unpacking malware. A key feature is the ability to uncompress and/or decrypt areas of memory quickly using algorithms focused on basic block compression and loop optimizations.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"5 1","pages":"507-510"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72824973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Antipatterns causing memory bloat: A case study","authors":"Kamil Jezek, Richard Lipka","doi":"10.1109/SANER.2017.7884631","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884631","url":null,"abstract":"Java is one of the languages that are popular for high abstraction and automatic memory management. As in other object-oriented languages, Java's objects can easily represent a domain model of an application. While it has a positive impact on the design, implementation and maintenance of applications, there are drawbacks as well. One of them is a relatively high memory overhead to manage objects. In this work, we show our experience with searching for this problem in an application that we refactored to use less memory. Although the application was relatively well designed with no memory leaks, it required such a big amount of memory that for large data the application was not usable in reality. We did three relatively simple improvements: we reduced the usage of Java Collections, removed unnecessary object instances, and simplified the domain model, which reduced memory needs up to 88% and made the application better usable and even faster. This work is a case-study reporting results. Moreover, the employed ideas are formulated as a set of antipatterns, which may be used for other applications.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"7 1","pages":"306-315"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75183617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The dark side of event sourcing: Managing data conversion","authors":"Michiel Overeem, M. Spoor, S. Jansen","doi":"10.1109/SANER.2017.7884621","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884621","url":null,"abstract":"Evolving software systems includes data schema changes, and because of those schema changes data has to be converted. Converting data between two different schemas while continuing the operation of the system is a challenge when that system is expected to be available always. Data conversion in event sourced systems introduces new challenges, because of the relative novelty of the event sourcing architectural pattern, because of the lack of standardized tools for data conversion, and because of the large amount of data that is stored in typical event stores. This paper addresses the challenge of schema evolution and the resulting data conversion for event sourced systems. First of all a set of event store upgrade operations is proposed that can be used to convert data between two versions of a data schema. Second, a set of techniques and strategies that execute the data conversion while continuing the operation of the system is discussed. The final contribution is an event store upgrade framework that identifies which techniques and strategies can be combined to execute the event store upgrade operations while continuing operation of the system. Two utilizations of the framework are given, the first being as decision support in upfront design of an upgrade system for event sourced systems. The framework can also be utilized as the description of an automated upgrade system that can be used for continuous deployment. The event store upgrade framework is evaluated in interviews with three renowned experts in the domain and has been found to be a comprehensive overview that can be utilized in the design and implementation of an upgrade system. The automated upgrade system has been implemented partially and applied in experiments.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"130 1","pages":"193-204"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86289251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun Zhang, D. Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun
{"title":"Detecting similar repositories on GitHub","authors":"Yun Zhang, D. Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun","doi":"10.1109/SANER.2017.7884605","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884605","url":null,"abstract":"GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"72 1","pages":"13-23"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84079176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for classifying and comparing source code recommendation systems","authors":"Mohammad Ghafari, Hamidreza Moradi","doi":"10.1109/SANER.2017.7884674","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884674","url":null,"abstract":"The use of Application Programming Interfaces (APIs) is pervasive in software systems; it makes the development of new software much easier, but remembering large APIs with sophisticated usage protocol is arduous for software developers. Code recommendation systems alleviate this burden by providing developers with a ranked list of API usages that are estimated to be most useful to their development tasks. The promise of these systems has motivated researchers to invest considerable effort to develop many of them over the past decade, yet the achievements are not evident. To assess the state of the art in code recommendation, we propose a framework for classifying and comparing these systems. We hope the framework will help the community to conduct a systematic study to gain insight into how much code recommendation has so far achieved, in both research and practice.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"20 1","pages":"555-556"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84272475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Saboury, Pooya Musavi, Foutse Khomh, G. Antoniol
{"title":"An empirical study of code smells in JavaScript projects","authors":"Amir Saboury, Pooya Musavi, Foutse Khomh, G. Antoniol","doi":"10.1109/SANER.2017.7884630","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884630","url":null,"abstract":"JavaScript is a powerful scripting programming language that has gained a lot of attention this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to applications written in other programming languages, JavaScript applications contain code smells, which are poor design choices that can negatively impact the quality of an application. In this paper, we investigate code smells in JavaScript server-side applications with the aim to understand how they impact the fault-proneness of applications. We detect 12 types of code smells in 537 releases of five popular JavaScript applications (i.e., express, grunt, bower, less.js, and request) and perform survival analysis, comparing the time until a fault occurrence, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells. (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers, to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks”, “Variable Re-assign” and “Long Parameter List” code smells to be serious design problems that hinder the maintainability and reliability of applications. This assessment is in line with the findings of our quantitative analysis. Overall, code smells affect negatively the quality of JavaScript applications and developers should consider tracking and removing them early on before the release of applications to the public.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"67 1","pages":"294-305"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83581396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lost comments support program comprehension","authors":"Takayuki Omori","doi":"10.1109/SANER.2017.7884680","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884680","url":null,"abstract":"Source code comments are valuable to keep developers' explanations of code fragments. Proper comments help code readers understand the source code quickly and precisely. However, developers sometimes delete valuable comments since they do not know about the readers' knowledge and think the written comments are redundant. This paper describes a study of lost comments based on edit operation histories of source code. The experimental result shows that developers sometimes delete comments although their associated code fragments are not changed. Lost comments contain valuable descriptions that can be utilized as new data sources to support program comprehension.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"74 1","pages":"567-568"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85910665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carol V. Alexandru, Sebastiano Panichella, H. Gall
{"title":"Reducing redundancies in multi-revision code analysis","authors":"Carol V. Alexandru, Sebastiano Panichella, H. Gall","doi":"10.1109/SANER.2017.7884617","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884617","url":null,"abstract":"Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"2 1","pages":"148-159"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89950594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tetsuya Kanda, M. Guo, Hideaki Hata, Ken-ichi Matsumoto
{"title":"Towards understanding an open-source bounty: Analysis of Bountysource","authors":"Tetsuya Kanda, M. Guo, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/SANER.2017.7884685","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884685","url":null,"abstract":"When developing and maintaining a software project, many issues about bug fixing or feature addition are reported on the Bug Tracking System (BTS) and the Issue Tracking System (ITS). Bountysource is a web founding platform that awards developers who have solved issues on the BTS/ITS. Users can post a bounty for the issues, and a developer who solves the issue can get that bounty. This research analyzes Bountysource to clarify how bounties act in open source software projects and discusses further research topics in open-source bounties.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1632 1","pages":"577-578"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76275161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of opaque constants based on the k-clique problem for resilient data obfuscation","authors":"Roberto Tiella, M. Ceccato","doi":"10.1109/SANER.2017.7884620","DOIUrl":"https://doi.org/10.1109/SANER.2017.7884620","url":null,"abstract":"Data obfuscations are program transformations used to complicate program understanding and conceal actual values of program variables. The possibility to hide constant values is a basic building block of several obfuscation techniques. For example, in XOR Masking a constant mask is used to encode data, but this mask must be hidden too, in order to keep the obfuscation resilient to attacks. In this paper, we present a novel technique based on the k-clique problem, which is known to be NP-complete, to generate opaque constants, i.e. values that are difficult to guess by static analysis. In our experimental assessment we show that our opaque constants are computationally cheap to generate, both at obfuscation time and at runtime. Moreover, due to the NP-completeness of the k-clique problem, our opaque constants can be proven to be hard to attack with state-of-the-art static analysis tools.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"54 1","pages":"182-192"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90627480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}