Miryung Kim, Thomas Zimmermann, R. Deline, Andrew Begel
{"title":"The Emerging Role of Data Scientists on Software Development Teams","authors":"Miryung Kim, Thomas Zimmermann, R. Deline, Andrew Begel","doi":"10.1145/2884781.2884783","DOIUrl":"https://doi.org/10.1145/2884781.2884783","url":null,"abstract":"Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"36 1","pages":"96-107"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79446691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. M. Uesbeck, A. Stefik, Stefan Hanenberg, J. Pedersen, P. Daleiden
{"title":"An Empirical Study on the Impact of C++ Lambdas and Programmer Experience","authors":"P. M. Uesbeck, A. Stefik, Stefan Hanenberg, J. Pedersen, P. Daleiden","doi":"10.1145/2884781.2884849","DOIUrl":"https://doi.org/10.1145/2884781.2884849","url":null,"abstract":"Lambdas have seen increasing use in mainstream programming languages, notably in Java 8 and C++ 11. While the technical aspects of lambdas are known, we conducted the first randomized controlled trial on the human factors impact of C++ 11 lambdas compared to iterators. Because there has been recent debate on having students or professionals in experiments, we recruited undergraduates across the academic pipeline and professional programmers to evaluate these findings in a broader context. Results afford some doubt that lambdas benefit developers and show evidence that students are negatively impacted in regard to how quickly they can write correct programs to a test specification and whether they can complete a task. Analysis from log data shows that participants spent more time with compiler errors, and have more errors, when using lambdas as compared to iterators, suggesting difficulty with the syntax chosen for C++. Finally, experienced users were more likely to complete tasks, with or without lambdas, and could do so more quickly, with experience as a factor explaining 45.7% of the variance in our sample in regard to completion time.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"760-771"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79646070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fixing Deadlocks via Lock Pre-Acquisitions","authors":"Yan Cai, Lingwei Cao","doi":"10.1145/2884781.2884819","DOIUrl":"https://doi.org/10.1145/2884781.2884819","url":null,"abstract":"Manual deadlock fixing is error-prone and time-consuming. Exist-ing generic approach (GA) simply inserts gate locks to fix dead-locks by serializing executions, which could introduce various new deadlocks and incur high runtime overhead. We propose a novel approach DFixer to fix deadlocks without introducing any new deadlocks by design. DFixer only selects one thread of a deadlock to pre-acquire a lock w together with another lock h, where before fixing, the deadlock occurs when the thread holds lock h and waits for lock w. As such, DFixer eliminates a hold-and-wait necessary condition, preventing the deadlock from occurring. The thread per-forming pre-acquisition is carefully selected such that no other syn-chronization exists in between the two original acquisitions. Other-wise, DFixer further introduces a context-aware conditional protect-ed by above lock w to guarantee the correctness of DFixer. The evaluation is on 20 deadlocks, including 17 from widely-used real-world C/C++ programs. It shows that DFixer successfully fixed all deadlocks. Whereas GA introduced 9 new deadlocks; a latest work Grail failed to fix 8 deadlocks and introduced 3 new deadlocks on others. On average, DFixer incurred only 2.1% overhead, where GA and Grail incurred 15.8% and 11.5% overhead, respectively.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1109-1120"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81493292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coverage-Driven Test Code Generation for Concurrent Classes","authors":"Valerio Terragni, S. Cheung","doi":"10.1145/2884781.2884876","DOIUrl":"https://doi.org/10.1145/2884781.2884876","url":null,"abstract":"Previous techniques on concurrency testing have mainly focused on exploring the interleaving space of manually written test code to expose faulty interleavings of shared memory accesses. These techniques assume the availability of failure-inducing tests. In this paper, we present AutoConTest, a coverage-driven approach to generate effective concurrent test code that achieve high interleaving coverage. AutoConTest consists of three components. First, it computes the coverage requirements dynamically and iteratively during sequential test code generation, using a coverage metric that captures the execution context of shared memory accesses. Second, it smartly selects these sequential codes based on the computed result and assembles them for concurrent tests, achieving increased context-sensitive interleaving coverage. Third, it explores the newly covered interleavings. We have implemented AutoConTest as an automated tool and evaluated it using 6 real-world concurrent Java subjects. The results show that AutoConTest is able to generate effective concurrent tests that achieve high interleaving coverage and expose concurrency faults quickly. AutoConTest took less than 65 seconds (including program analysis, test generation and execution) to expose the faults in the program subjects.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"14 1","pages":"1121-1132"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74908206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magnus Madsen, F. Tip, Esben Andreasen, Koushik Sen, Anders Møller
{"title":"Feedback-Directed Instrumentation for Deployed JavaScript Applications","authors":"Magnus Madsen, F. Tip, Esben Andreasen, Koushik Sen, Anders Møller","doi":"10.1145/2884781.2884846","DOIUrl":"https://doi.org/10.1145/2884781.2884846","url":null,"abstract":"Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For this type of error, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation.We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to reduce it for users who do not encounter the crash. We implemented our technique in a tool, Crowdie, and evaluated it on 10 real-world issues for which error messages and stack traces are insufficient to isolate the problem. Our results show that feedback-directed instrumentation requires 5% to 25% of the program to be instrumented, that the same crash must be observed 3 to 10 times to discover the crash path, and that feedback-directed instrumentation typically slows down execution by a factor 2x–9x compared to 8x–90x for an approach where applications are fully instrumented.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"338 1","pages":"899-910"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75357329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grounded Theory in Software Engineering Research: A Critical Review and Guidelines","authors":"Klaas-Jan Stol, P. Ralph, Brian Fitzgerald","doi":"10.1145/2884781.2884833","DOIUrl":"https://doi.org/10.1145/2884781.2884833","url":null,"abstract":"Grounded Theory (GT) has proved an extremely useful research approach in several fields including medical sociology, nursing, education and management theory. However, GT is a complex method based on an inductive paradigm that is fundamentally different from the traditional hypothetico-deductive research model. As there are at least three variants of GT, some ostensibly GT research suffers from method slurring, where researchers adopt an arbitrary subset of GT practices that are not recognizable as GT. In this paper, we describe the variants of GT and identify the core set of GT practices. We then analyze the use of grounded theory in software engineering. We carefully and systematically selected 98 articles that mention GT, of which 52 explicitly claim to use GT, with the other 46 using GT techniques only. Only 16 articles provide detailed accounts of their research procedures. We offer guidelines to improve the quality of both conducting and reporting GT studies. The latter is an important extension since current GT guidelines in software engineering do not cover the reporting process, despite good reporting being necessary for evaluating a study and informing subsequent research.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"120-131"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78085125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irene Manotas, C. Bird, Rui Zhang, D. Shepherd, Ciera Jaspan, Caitlin Sadowski, L. Pollock, J. Clause
{"title":"An Empirical Study of Practitioners' Perspectives on Green Software Engineering","authors":"Irene Manotas, C. Bird, Rui Zhang, D. Shepherd, Ciera Jaspan, Caitlin Sadowski, L. Pollock, J. Clause","doi":"10.1145/2884781.2884810","DOIUrl":"https://doi.org/10.1145/2884781.2884810","url":null,"abstract":"The energy consumption of software is an increasing concern as the use of mobile applications, embedded systems, and data center-based services expands. While research in green software engineering is correspondingly increasing, little is known about the current practices and perspectives of software engineers in the field. This paper describes the first empirical study of how practitioners think about energy when they write requirements, design, construct, test, and maintain their software. We report findings from a quantitative,targeted survey of 464 practitioners from ABB, Google, IBM, and Microsoft, which was motivated by and supported with qualitative data from 18 in-depth interviews with Microsoft employees. The major findings and implications from the collected data contextualize existing green software engineering research and suggest directions for researchers aiming to develop strategies and tools to help practitioners improve the energy usage of their applications.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"237-248"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81978597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmenting API Documentation with Insights from Stack Overflow","authors":"Christoph Treude, M. Robillard","doi":"10.1145/2884781.2884800","DOIUrl":"https://doi.org/10.1145/2884781.2884800","url":null,"abstract":"Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with \"insight sentences\" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"392-403"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Zhai, Jianjun Huang, Shiqing Ma, X. Zhang, Lin Tan, Jianhua Zhao, Feng Qin
{"title":"Automatic Model Generation from Documentation for Java API Functions","authors":"Juan Zhai, Jianjun Huang, Shiqing Ma, X. Zhang, Lin Tan, Jianhua Zhao, Feng Qin","doi":"10.1145/2884781.2884881","DOIUrl":"https://doi.org/10.1145/2884781.2884881","url":null,"abstract":"Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"380-391"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90066313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation","authors":"Xiaoyuan Jing, Fumin Qi, Fei Wu, Baowen Xu","doi":"10.1145/2884781.2884827","DOIUrl":"https://doi.org/10.1145/2884781.2884827","url":null,"abstract":"Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"62 1","pages":"607-618"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79645392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}