{"title":"Security Vulnerabilities in Categories of Clones and Non-Cloned Code: An Empirical Study","authors":"M. R. Islam, M. Zibran, Aayush Nagpal","doi":"10.1109/ESEM.2017.9","DOIUrl":"https://doi.org/10.1109/ESEM.2017.9","url":null,"abstract":"Background: Software security has drawn immense importance in the recent years. While efforts are expected in minimizing security vulnerabilities in source code, the developers' practice of code cloning often causes multiplication of such vulnerabilities and program faults. Although previous studies examined the bug-proneness, stability, and changeability of clones against non-cloned code, the security aspects remained ignored. Aims: The objective of this work is to explore and understand the security vulnerabilities and their severity in different types of clones compared to non-clone code. Method: Using a state-of-the-art clone detector and two reputed security vulnerability detection tools, we detect clones and vulnerabilities in 8.7 million lines of code over 34 software systems. We perform a comparative study of the vulnerabilities identified in different types of clones and non-cloned code. The results are derived based on quan-titative analyses with statistical significance. Results: Our study reveals that the security vulnerabilities found in code clones have higher severity of security risks compared to those in non-cloned code. However, the proportion (i.e., density) of vulnerabilities in clones and non-cloned code does not have any significant difference. Conclusion: The findings from this work add to our understanding of the characteristics and impacts of clones, which will be useful in clone-aware software development with improved software security.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124258897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Member Checking in Software Engineering Research: Lessons Learned from an Industrial Case Study","authors":"Ronnie E. S. Santos, F. Silva, C. Magalhães","doi":"10.1109/ESEM.2017.29","DOIUrl":"https://doi.org/10.1109/ESEM.2017.29","url":null,"abstract":"Context. Member checking can be defined as a research phase performed during a qualitative research in which the researcher compares her interpretations and understanding obtained from the data analysis with the view-points of participants to increase accuracy and consistency of results. This is an important step for any qualitative research. However, considering a sample of 66 case studies developed and published in the context of software engineering, only 10 studies briefly described the use of this technique. Method. In this article, we present a set of lessons learned obtained from planning and performing member checking to validate the results of an industrial case study performed in a large software company. Results. Member checking was effective to validate the findings obtained from the qualitative case study and was also useful to reveal important information not observed in the data analysis process. It has also shown to be effective to observe divergences among different groups of participants. Conclusion. We described how the member checking can be performed, and discussed seven lessons learned in this process. We expect that our experience can be useful to software engineering researchers while performing this research phase in case studies.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129275005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Pashchenko, Stanislav Dashevskyi, F. Massacci
{"title":"Delta-Bench: Differential Benchmark for Static Analysis Security Testing Tools","authors":"Ivan Pashchenko, Stanislav Dashevskyi, F. Massacci","doi":"10.1109/ESEM.2017.24","DOIUrl":"https://doi.org/10.1109/ESEM.2017.24","url":null,"abstract":"Background: Static analysis security testing (SAST) tools may be evaluated using synthetic micro benchmarks and benchmarks based on real-world software. Aims: The aim of this study is to address the limitations of the existing SAST tool benchmarks: lack of vulnerability realism, uncertain ground truth, and large amount of findings not related to analyzed vulnerability. Method: We propose Delta-Bench - a novel approach for the automatic construction of benchmarks for SAST tools based on differencing vulnerable and fixed versions in Free and Open Source (FOSS) repositories. To test our approach, we used 7 state of the art SAST tools against 70 revisions of four major versions of Apache Tomcat spanning 62 distinct Common Vulnerabilities and Exposures (CVE) fixes and vulnerable files totalling over 100K lines of code as the source of ground truth vulnerabilities. Results: Our experiment allows us to draw interesting conclusions (e.g., tools perform differently due to the selected benchmark). Conclusions: Delta-Bench allows SAST tools to be automatically evaluated on the real-world historical vulnerabilities using only the findings that a tool produced for the analysed vulnerability.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114682612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured Synthesis Method: The Evidence Factory Tool","authors":"P. Santos, G. Travassos","doi":"10.1109/ESEM.2017.68","DOIUrl":"https://doi.org/10.1109/ESEM.2017.68","url":null,"abstract":"Background: research synthesis is still challenge in Software Engineering due to the heterogeneity of primary studies in the area. Also, it generates a significant volume of information which is complex to manage. Aims: to provide support to this kind of studies in SE. Method: we present the Evidence Factory, a tool designed to support the Structured Synthesis Method (SSM). SSM is a research synthesis method that can be used to aggregate both quantitative and qualitative studies. It is a kind of integrative synthesis method, such as meta-analysis, but has several features from interpretative methods, such as meta-ethnography, particularly those concerned with conceptual development. Results: the tool is a web-based infrastructure, which supports the organization of synthesis studies. Researchers can compare findings from different studies by modeling their results according to the evidence meta-model. After deciding whether the evidence can be combined, the tool automatically computes the uncertainty associated with the aggregated results using the formalisms from the Mathematical Theory of Evidence. Conclusion: the tool was used in real synthesis studies and is freely available for the SE community.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Foyzul Hassan, Shaikh Mostafa, E. Lam, Xiaoyin Wang
{"title":"Automatic Building of Java Projects in Software Repositories: A Study on Feasibility and Challenges","authors":"Foyzul Hassan, Shaikh Mostafa, E. Lam, Xiaoyin Wang","doi":"10.1109/ESEM.2017.11","DOIUrl":"https://doi.org/10.1109/ESEM.2017.11","url":null,"abstract":"Despite the advancement in software build tools such as Maven and Gradle, human involvement is still often required in software building. To enable large-scale advanced program analysis and data mining of software artifacts, software engineering researchers need to have a large corpus of built software, so automatic software building becomes essential to improve research productivity. In this paper, we present a feasibility study on automatic software building. Particularly, we first put state-of-the-art build automation tools (Ant, Maven and Gradle) to the test by automatically executing their respective default build commands on top 200 Java projects from GitHub. Next, we focus on the 86 projects that failed this initial automated build attempt, manually examining and determining correct build sequences to build each of these projects. We present a detailed build failure taxonomy from these build results and show that at least 57% build failures can be automatically resolved.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117320149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantifying the Transition from Python 2 to 3: An Empirical Study of Python Applications","authors":"B. Malloy, James F. Power","doi":"10.1109/ESEM.2017.45","DOIUrl":"https://doi.org/10.1109/ESEM.2017.45","url":null,"abstract":"Background: Python is one of the most popular modern programming languages. In 2008 its authors introduced a new version of the language, Python 3.0, that was not backward compatible with Python 2, initiating a transitional phase for Python software developers. Aims: The study described in this paper investigates the degree to which Python software developers are making the transition from Python 2 to Python 3. Method: We have developed a Python compliance analyser, PyComply, and have assembled a large corpus of Python applications. We use PyComply to measure and quantify the degree to which Python 3 features are being used, as well as the rate and context of their adoption. Results: In fact, Python software developers are not exploiting the new features and advantages of Python 3, but rather are choosing to retain backward compatibility with Python 2. Conclusions: Python developers are confining themselves to a language subset, governed by the diminishing intersection of Python 2, which is not under development, and Python 3, which is under development with new features being introduced as the language continues to evolve.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121887099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Version Control System for Automatically Generating Commit Comment","authors":"Yuan Huang, Qiaoyang Zheng, Xiangping Chen, Yingfei Xiong, Zhiyong Liu, Xiaonan Luo","doi":"10.1109/ESEM.2017.56","DOIUrl":"https://doi.org/10.1109/ESEM.2017.56","url":null,"abstract":"Commit comments increasingly receive attention as an important complementary component in code change comprehension. To address the comment scarcity issue, a variety of automatic approaches for commit comment generation have been intensively proposed. However, most of these approaches mechanically outline a superficial level summary of the changed software entities, the change intent behind the code changes is lost (e.g., the existing approaches cannot generate such comment: \"fixing null pointer exception\"). Considering the comments written by developers often describe the intent behind the code change, we propose a method to automatically generate commit comment by reusing the existing comments in version control system. Specifically, for an input commit, we apply syntax, semantic, pre-syntax, and pre-semantic similarities to discover the similar commits from half a million commits, and recommend the reusable comments to the input commit from the ones of the similar commits. We evaluate our approach on 7 projects. The results show that 9.1% of the generated comments are good, 27.7% of the generated comments need minor fix, and 63.2% are bad, and we also analyze the reasons that make a comment available or unavailable.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122072177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denae Ford, Tom Zimmermann, C. Bird, Nachiappan Nagappan
{"title":"Characterizing Software Engineering Work with Personas Based on Knowledge Worker Actions","authors":"Denae Ford, Tom Zimmermann, C. Bird, Nachiappan Nagappan","doi":"10.1109/ESEM.2017.54","DOIUrl":"https://doi.org/10.1109/ESEM.2017.54","url":null,"abstract":"Mistaking versatility for universal skills, some companies tend to categorize all software engineers the same not knowing a difference exists. For example, a company may select one of many software engineers to complete a task, later finding that the engineer's skills and style do not match those needed to successfully complete that task. This can result in delayed task completion and demonstrates that a one-size fits all concept should not apply to how software engineers work. In order to gain a comprehensive understanding of different software engineers and their working styles we interviewed 21 participants and surveyed 868 software engineers at a large software company and asked them about their work in terms of knowledge worker actions. We identify how tasks, collaboration styles, and perspectives of autonomy can significantly effect different approaches to software engineering work. To characterize differences, we describe empirically informed personas on how they work. Our defined software engineering personas include those with focused debugging abilities, engineers with an active interest in learning, experienced advisors who serve as experts in their role, and more. Our study and results serve as a resource for building products, services, and tools around these software engineering personas.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126161220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What the Job Market Wants from Requirements Engineers? An Empirical Analysis of Online Job Ads from the Netherlands","authors":"M. Daneva, Chong Wang, P. Hoener","doi":"10.1109/ESEM.2017.60","DOIUrl":"https://doi.org/10.1109/ESEM.2017.60","url":null,"abstract":"Recently, the requirements engineering (RE) community recognized the increasing need for understanding how industry perceives the jobs of requirements engineers and their most important qualifications. This study contributes to the community's research effort on this topic. Based on an analysis of RE job ads in 2015 from the Netherlands' three most popular online IT-job portals, we identified those task and skill related qualifications that employers demand from RE job seekers. We found that the job titles used in industry for the specialists that the RE community calls 'requirements engineers', are 'Product Owner' and 'Analyst', be it Information Analyst, Application Analyst, or Data Analyst. Those professionals supposed to perform RE tasks also take responsibility for additional tasks including quality assurance, realization and deployment, and project management. The most in-demand skills are soft skills: proficiency in Dutch and English, plus communication and analytical skills. RE is perceived as an occupation for experts, with 23% of all job ads asking explicitly for RE experience and 63% asking for experience similar to RE.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129964137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are One-Time Contributors Different? A Comparison to Core and Periphery Developers in FLOSS Repositories","authors":"Amanda Lee, Jeffrey C. Carver","doi":"10.1109/ESEM.2017.7","DOIUrl":"https://doi.org/10.1109/ESEM.2017.7","url":null,"abstract":"Context: Free/Libre Open Source Software (FLOSS) communities consist of different types of contributors. Core contributors and peripheral contributors work together to create a successful project, each playing a different role. One-Time Contributors (OTCs), who are on the very fringe of the peripheral developers, are largely unstudied despite offering unique insights into the development process. In a prior survey, we identified OTCs and discovered their motivations and barriers. Aims: The objective of this study is to corroborate the survey results and provide a better understand of OTCs. We compare OTCs to other peripheral and core contributors to determine whether they are distinct. Method: We mined data from the same code-review repository used to identify survey respondents in our previous study. After identifying each contributor as core, periphery, or OTC, we compared them in terms of patch size, time interval from submission to decision, the nature of their conversations, and patch acceptance rates. Results: We identified a continuum between core developers and OTCs. OTCs create smaller patches, face longer time intervals between patch submission and rejection, have longer review conversations, and face lower patch acceptance rates. Conversely, core contributors create larger patches, face shorter time intervals for feedback, have shorter review conversations, and have patches accepted at the highest rate. The peripheral developers fall in between the OTCs and the core contributors. Conclusion: OTCs do, in fact, face the barriers identified in our prior survey. They represent a distinct group of contributors compared to core and peripheral developers.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}