{"title":"Describing What Experimental Software Engineering Experts Do When They Design Their Experiments - A Qualitative Study","authors":"Liliane Fonseca, C. Seaman, S. Soares","doi":"10.1109/ESEM.2017.63","DOIUrl":"https://doi.org/10.1109/ESEM.2017.63","url":null,"abstract":"Background: Although there has been a significant amount of research focused on designing and conducting controlled experiments, few studies report how experienced experimental software engineering researchers actually design and conduct their studies. Aims: This study aimed to offer a practical perspective from their viewpoint regarding controlled experiment planning. Method: We collected data through semi-structured interviews from 11 researchers, and we used qualitative analysis methods from the grounded theory approach to analyze them. Result: Although the complete study presents four research questions, in this paper, we answer the first one. As a result, we present a preliminary result about what these experts actually do when they design experiments. Conclusions: This work contributes to a better understanding of the practical performance of experimental software engineering.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126998894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Which Version Should Be Released to App Store?","authors":"Maleknaz Nayebi, Homayoon Farrahi, G. Ruhe","doi":"10.1109/ESEM.2017.46","DOIUrl":"https://doi.org/10.1109/ESEM.2017.46","url":null,"abstract":"Background: Several mobile app releases do not find their way to the end users. Our analysis of 11,514 releases across 917 open source mobile apps revealed that 44.3% of releases created in GitHub never shipped to the app store (market). Aims: We introduce \"marketability\" of open source mobile apps as a new release decision problem. Considering app stores as a complex system with unknown treatments, we evaluate performance of predictive models and analogical reasoning for marketability decisions. Method: We performed a survey with 22 release engineers to identify the importance of marketability release decision. We compared different classifiers to predict release marketability. For guiding the transition of not successfully marketable releases into successful ones, we used analogical reasoning. We evaluated our results both internally (over time) and externally (by developers). Results: Random forest classification performed best with F1 score of 78%. Analyzing 58 releases over time showed that, for 81% of them, analogical reasoning could correctly identify changes in the majority of release attributes. A survey with seven developers showed the usefulness of our method for supporting real world decisions. Conclusions: Marketability decisions of mobile apps can be supported by using predictive analytics and by considering and adopting similar experience from the past.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129120179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study","authors":"R. Mello, R. Oliveira, Alessandro F. Garcia","doi":"10.1109/ESEM.2017.13","DOIUrl":"https://doi.org/10.1109/ESEM.2017.13","url":null,"abstract":"Context: Code smells are symptoms in the source code that represent poor design choices. Professional developers often perceive several types of code smells as indicators of actual design problems. However, the identification of code smells involves multiple steps that are subjective in nature, requiring the engagement of humans. Human factors are likely to play a key role in the precise identification of code smells in industrial settings. Unfortunately, there is limited knowledge about the influence of human factors on smell identification. Goal: We aim at investigating whether the precision of smell identification is influenced by three key human factors, namely reviewer's professional background, reviewer's module knowledge and collaboration of reviewers during the task. We also aim at deriving recommendations for allocating human resources to smell identification tasks. Method: We performed 19 comparisons among different subsamples from two trials of a controlled experiment conducted in the context of an empirical study on code smell identification. One trial was conducted in industrial settings while the other had involved graduate students. The diversity of the samples allowed us to analyze the influence of the three factors in isolation and in conjunction. Results: We found that (i) reviewers' collaboration significantly increases the precision of smell identification, but (ii) some professional background is required from the reviewers to reach high precision. Surprisingly, we also found that: (iii) having previous knowledge of the reviewed module does not affect the precision of reviewers with higher professional background. However, this factor was influential on successful identification of more complex smells. Conclusion: We expect that our findings are helpful to support researchers in conducting proper experimental procedures in the future. Besides, they may also be useful for supporting project managers in allocating resources for smell identification tasks.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127909874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Bin, Kai-Jing Zhou, Hongmin Lu, Yuming Zhou, Baowen Xu
{"title":"Training Data Selection for Cross-Project Defection Prediction: Which Approach Is Better?","authors":"Yi Bin, Kai-Jing Zhou, Hongmin Lu, Yuming Zhou, Baowen Xu","doi":"10.1109/ESEM.2017.49","DOIUrl":"https://doi.org/10.1109/ESEM.2017.49","url":null,"abstract":"Background: Many relevancy filters have been proposed to select training data for building cross-project defect prediction (CPDP) models. However, up to now, there is no consensus about which relevancy filter is better for CPDP. Goal: In this paper, we conduct a thorough experiment to compare nine relevancy filters proposed in the recent literature. Method: Based on 33 publicly available data sets, we compare not only the retaining ratio of the original training data and the overlapping degree among the retained data but also the prediction performance of the resulting CPDP models under the ranking and classification scenarios. Results: In terms of retaining ratio and overlapping degree, there are important differences among these filters. According to the defect prediction performance, global filter always stays in the first level. Conclusions: For practitioners, it appears that there is no need to filter source project data, as this may lead to better defect prediction results.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131448543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vaibhav Anu, G. Walia, Wenhua Hu, Jeffrey C. Carver, Gary L. Bradshaw
{"title":"Issues and Opportunities for Human Error-Based Requirements Inspections: An Exploratory Study","authors":"Vaibhav Anu, G. Walia, Wenhua Hu, Jeffrey C. Carver, Gary L. Bradshaw","doi":"10.1109/ESEM.2017.62","DOIUrl":"https://doi.org/10.1109/ESEM.2017.62","url":null,"abstract":"[Background] Software inspections are extensively used for requirements verification. Our research uses the perspective of human cognitive failures (i.e., human errors) to improve the fault detection effectiveness of traditional fault-checklist based inspections. Our previous evaluations of a formal human error based inspection technique called Error Abstraction and Inspection (EAI) have shown encouraging results, but have also highlighted a real need for improvement. [Aims and Method] The goal of conducting the controlled study presented in this paper was to identify the specific tasks of EAI that inspectors find most difficult to perform and the strategies that successful inspectors use when performing the tasks. [Results] The results highlighted specific pain points of EAI that can be addressed by improving the training and instrumentation.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131357490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iftekhar Ahmed, Caius Brindescu, Umme Ayda Mannan, Carlos Jensen, A. Sarma
{"title":"An Empirical Examination of the Relationship between Code Smells and Merge Conflicts","authors":"Iftekhar Ahmed, Caius Brindescu, Umme Ayda Mannan, Carlos Jensen, A. Sarma","doi":"10.1109/ESEM.2017.12","DOIUrl":"https://doi.org/10.1109/ESEM.2017.12","url":null,"abstract":"Background: Merge conflicts are a common occurrence in software development. Researchers have shown the negative impact of conflicts on the resulting code quality and the development workflow. Thus far, no one has investigated the effect of bad design (code smells) on merge conflicts. Aims: We posit that entities that exhibit certain types of code smells are more likely to be involved in a merge conflict. We also postulate that code elements that are both \"smelly\" and involved in a merge conflict are associated with other undesirable effects (more likely to be buggy). Method: We mined 143 repositories from GitHub and recreated 6,979 merge conflicts to obtain metrics about code changes and conflicts. We categorized conflicts into semantic or non-semantic, based on whether changes affected the Abstract Syntax Tree. For each conflicting change, we calculate the number of code smells and the number of future bug-fixes associated with the affected lines of code. Results: We found that entities that are smelly are three times more likely to be involved in merge conflicts. Method-level code smells (Blob Operation and Internal Duplication) are highly correlated with semantic conflicts. We also found that code that is smelly and experiences merge conflicts is more likely to be buggy. Conclusion: Bad code design not only impacts maintainability, it also impacts the day to day operations of a project, such as merging contributions, and negatively impacts the quality of the resulting code. Our findings indicate that research is needed to identify better ways to support merge conflict resolution to minimize its effect on code quality.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128132371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Does Machine Translated User Interface Affect User Experience? A Study on Android Apps","authors":"Xue Qin, Smitha Holla, Liang Huang, Lymari Montijo, Dylan Aguirre, Xiaoyin Wang","doi":"10.1109/ESEM.2017.58","DOIUrl":"https://doi.org/10.1109/ESEM.2017.58","url":null,"abstract":"For global-market-oriented software applications, it is required that their user interface be translated to local languages so that users from different areas in the world can use the software. A long-term practice in software industry is to hire professional translators or translation companies to perform the translation. However, due to the large number of user-interface labels and target languages, this is often too expensive for software providers, especially cost-sensitive providers such as personal developers of mobile apps. As natural language processing and machine techniques advance, more mature machine translation techniques are providing a cheap though imperfect alternative, and the Google Translation service has already been widely used for translating websites and apps. However, the effect of lower translation quality on user experience has not been well studied yet. In this paper, we present a user study on 6 popular Android apps, which involves 24 participants performing tasks on app variants with 4 different translation quality levels and 2 target languages: Spanish and Chinese. From our study, we acquire the following 3 major findings, including (1) compared with original versions, machine translated versions of apps have similar task completion rate and efficiency on most studied apps; (2) machine translated versions have more tasks completed with flaws such unnecessary steps and missed optional steps, and (3) users are not satisfied with the GUI of machine translated versions and the two major complaints are misleading labels of input boxes, and unclear translation of items in option lists.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128406730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kabeer, Maleknaz Nayebi, G. Ruhe, Chris Carlson, Francis Chew
{"title":"Predicting the Vector Impact of Change - An Industrial Case Study at Brightsquid","authors":"S. Kabeer, Maleknaz Nayebi, G. Ruhe, Chris Carlson, Francis Chew","doi":"10.1109/ESEM.2017.20","DOIUrl":"https://doi.org/10.1109/ESEM.2017.20","url":null,"abstract":"Background: Understanding and controlling the impact of change decides about the success or failure of evolving products. The problem magnifies for start-ups operating with limited resources. Their usual focus is on Minimum Viable Product (MVP's) providing specialized functionality, thus have little expense available for handling changes. Aims: Change Impact Analysis (CIA) refers to the identification of source code files impacted when implementing a change request. We extend this question to predict not only affected files, but also the effort needed for implementing the change, and the duration necessary for that. Method: This study evaluates the performance of three textual similarity techniques for CIA based on Bag of words in combination with either topic modeling or file coupling. Results: The approaches are applied on data from two industrial projects. The data comes as part of an industrial collaboration project with Brightsquid, a Canadian start-up company specializing in secure communication solutions. Performance analysis shows that combining textual similarity with file coupling improves impact prediction, resulting in Recall of 67%. Effort and duration can be predicted with 84% and 72% accuracy using textual similarity only. Conclusions: The relative effort invested into CIA for predicting impacted files can be reduced by extending its applicability to multiple dimensions which include impacted files, effort, and duration.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125396002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the Heterogeneity of Contributors in Bug Bounty Programs","authors":"Hideaki Hata, M. Guo, M. Babar","doi":"10.1109/ESEM.2017.34","DOIUrl":"https://doi.org/10.1109/ESEM.2017.34","url":null,"abstract":"Background: While bug bounty programs are not new in software development, an increasing number of companies, as well as open source projects, rely on external parties to perform the security assessment of their software for reward. However, there is relatively little empirical knowledge about the characteristics of bug bounty program contributors. Aim: This paper aims to understand those contributors by highlighting the heterogeneity among them. Method: We analyzed the histories of 82 bug bounty programs and 2,504 distinct bug bounty contributors, and conducted a quantitative and qualitative survey. Results: We found that there are project-specific and non-specific contributors who have different motivations for contributing to the products and organizations. Conclusions: Our findings provide insights to make bug bounty programs better and for further studies of new software development roles.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125324177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphical vs. Tabular Notations for Risk Models: On the Role of Textual Labels and Complexity","authors":"Katsiaryna Labunets, F. Massacci, A. Tedeschi","doi":"10.1109/ESEM.2017.40","DOIUrl":"https://doi.org/10.1109/ESEM.2017.40","url":null,"abstract":"[Background] Security risk assessment methods in industry mostly use a tabular notation to represent the assessment results whilst academic works advocate graphical methods. Experiments with MSc students showed that the tabular notation is better than an iconic graphical notation for the comprehension of security risks. [Aim] We investigate whether the availability of textual labels and terse UML-style notation could improve comprehensibility. [Method] We report the results of an online comprehensibility experiment involving 61 professionals with an average of 9 years of working experience, in which we compared the ability to comprehend security risk assessments represented in tabular, UML-style with textual labels, and iconic graphical modeling notations. [Results] Tabular notation are still the most comprehensible notion in both recall and precision. However, the presence of textual labels does improve the precision and recall of participants over iconic graphical models. [Conclusion] Tabular representation better supports extraction of correct information of both simple and complex comprehensibility questions about security risks than the graphical notation but textual labels help.","PeriodicalId":213866,"journal":{"name":"2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130539142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}