E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto
{"title":"What Makes a Great Maintainer of Open Source Projects?","authors":"E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto","doi":"10.1109/ICSE43902.2021.00093","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00093","url":null,"abstract":"Although Open Source Software (OSS) maintainers devote a significant proportion of their work to coding tasks, great maintainers must excel in many other activities beyond coding. Maintainers should care about fostering a community, helping new members to find their place, while also saying \"no\" to patches that although are well-coded and well-tested, do not contribute to the goal of the project. To perform all these activities masterfully, maintainers should exercise attributes that software engineers (working on closed source projects) do not always need to master. This paper aims to uncover, relate, and prioritize the unique attributes that great OSS maintainers might have. To achieve this goal, we conducted 33 semi-structured interviews with well-experienced maintainers that are the gatekeepers of notable projects such as the Linux Kernel, the Debian operating system, and the GitLab coding platform. After we analyzed the interviews and curated a list of attributes, we created a conceptual framework to explain how these attributes are connected. We then conducted a rating survey with 90 OSS contributors. We noted that \"technical excellence\" and \"communication\" are the most recurring attributes. When grouped, these attributes fit into four broad categories: management, social, technical, and personality. While we noted that \"sustain a long term vision of the project\" and being \"extremely careful\" seem to form the basis of our framework, we noted through our survey that the communication attribute was perceived as the most essential one.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation","authors":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00130","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00130","url":null,"abstract":"With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja
{"title":"An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems","authors":"Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja","doi":"10.1109/ICSE43902.2021.00033","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00033","url":null,"abstract":"Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129527493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amreeta Chatterjee, M. Guizani, Catherine Stevens, Jillian Emard, Mary Evelyn May, M. Burnett, Iftekhar Ahmed, A. Sarma
{"title":"AID: An Automated Detector for Gender-Inclusivity Bugs in OSS Project Pages","authors":"Amreeta Chatterjee, M. Guizani, Catherine Stevens, Jillian Emard, Mary Evelyn May, M. Burnett, Iftekhar Ahmed, A. Sarma","doi":"10.1109/ICSE43902.2021.00128","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00128","url":null,"abstract":"The tools and infrastructure used in tech, including Open Source Software (OSS), can embed \"inclusivity bugs\"- features that disproportionately disadvantage particular groups of contributors. To see whether OSS developers have existing practices to ward off such bugs, we surveyed 266 OSS developers. Our results show that a majority (77%) of developers do not use any inclusivity practices, and 92% of respondents cited a lack of concrete resources to enable them to do so. To help fill this gap, this paper introduces AID, a tool that automates the GenderMag method to systematically find gender-inclusivity bugs in software. We then present the results of the tool's evaluation on 20 GitHub projects. The tool achieved precision of 0.69, recall of 0.92, an F-measure of 0.79 and even captured some inclusivity bugs that human GenderMag teams missed.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131366799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang
{"title":"Resource-Guided Configuration Space Reduction for Deep Learning Models","authors":"Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang","doi":"10.1109/ICSE43902.2021.00028","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00028","url":null,"abstract":"Deep learning models, like traditional software systems, provide a large number of configuration options. A deep learning model can be configured with different hyperparameters and neural architectures. Recently, AutoML (Automated Machine Learning) has been widely adopted to automate model training by systematically exploring diverse configurations. However, current AutoML approaches do not take into consideration the computational constraints imposed by various resources such as available memory, computing power of devices, or execution time. The training with non-conforming configurations could lead to many failed AutoML trial jobs or inappropriate models, which cause significant resource waste and severely slow down development productivity. In this paper, we propose DnnSAT, a resource-guided AutoML approach for deep learning models to help existing AutoML tools efficiently reduce the configuration space ahead of time. DnnSAT can speed up the search process and achieve equal or even better model learning performance because it excludes trial jobs not satisfying the constraints and saves resources for more trials. We formulate the resource-guided configuration space reduction as a constraint satisfaction problem. DnnSAT includes a unified analytic cost model to construct common constraints with respect to the model weight size, number of floating-point operations, model inference time, and GPU memory consumption. It then utilizes an SMT solver to obtain the satisfiable configurations of hyperparameters and neural architectures. Our evaluation results demonstrate the effectiveness of DnnSAT in accelerating state-of-the-art AutoML methods (Hyperparameter Optimization and Neural Architecture Search) with an average speedup from 1.19X to 3.95X on public benchmarks. We believe that DnnSAT can make AutoML more practical in a real-world environment with constrained resources.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang
{"title":"How Developers Optimize Virtual Reality Applications: A Study of Optimization Commits in Open Source Unity Projects","authors":"Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang","doi":"10.1109/ICSE43902.2021.00052","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00052","url":null,"abstract":"Virtual Reality (VR) is an emerging technique that provides immersive experience for users. Due to the high computation cost of rendering real-time animation twice (for both eyes) and the resource limitation of wearable devices, VR applications often face performance bottlenecks and performanceoptimization plays an important role in VR software develop-ment. Performance optimizations of VR applications can be very different from those in traditional software as VR involves more elements such as graphics rendering and real-time animation. In this paper, we present the first empirical study on 183 real-world performance optimizations from 45 VR software projects. In particular, we manually categorized the optimizations in to 11 categories, and applied static analysis to identify how they affect different life-cycle phases of VR applications. Furthermore, we studied the complexity and design / behavior effects of performance optimizations, and how optimizations are different between large organizational software projects and smaller personal software projects. Our major findings include: (1) graphics simplification (24.0%), rendering optimization (16.9%), language / API optimization (15.3%), heap avoidance (14.8%), and valuecaching (12.0%) are the most common categories of performance optimization in VR applications; (2) game logic updates (30.4%) and before-scene initialization (20.0%) are the most common life-cycle phases affected by performance issues; (3) 45.9% of the optimizations have behavior and design effects and 39.3% of the optimizations are systematic changes; (4) the distributionsof optimization classes are very different between organizational VR projects and personal VR projects.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer
{"title":"Reducing DNN Properties to Enable Falsification with Adversarial Attacks","authors":"David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer","doi":"10.1109/ICSE43902.2021.00036","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00036","url":null,"abstract":"Deep Neural Networks (DNN) are increasingly being deployed in safety-critical domains, from autonomous vehicles to medical devices, where the consequences of errors demand techniques that can provide stronger guarantees about behavior than just high test accuracy. This paper explores broadening the application of existing adversarial attack techniques for the falsification of DNN safety properties. We contend and later show that such attacks provide a powerful repertoire of scalable algorithms for property falsification. To enable the broad application of falsification, we introduce a semantics-preserving reduction of multiple safety property types, which subsume prior work, into a set of equivalid correctness problems amenable to adversarial attacks. We evaluate our reduction approach as an enabler of falsification on a range of DNN correctness problems and show its cost-effectiveness and scalability.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund
{"title":"Program Comprehension and Code Complexity Metrics: An fMRI Study","authors":"Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund","doi":"10.1109/ICSE43902.2021.00056","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00056","url":null,"abstract":"Background: Researchers and practitioners have been using code complexity metrics for decades to predict how developers comprehend a program. While it is plausible and tempting to use code metrics for this purpose, their validity is debated, since they rely on simple code properties and rarely consider particularities of human cognition. Aims: We investigate whether and how code complexity metrics reflect difficulty of program comprehension. Method: We have conducted a functional magnetic resonance imaging (fMRI) study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension, overall analyzing more than 41 metrics. Results: While our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension, fMRI allowed us to gain insights into why some code properties are difficult to process. In particular, a code's textual size drives programmers' attention, and vocabulary size burdens programmers' working memory. Conclusion: Our results provide neuro-scientific evidence supporting warnings of prior research questioning the validity of code complexity metrics and pin down factors relevant to program comprehension. Future Work: We outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116408590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis","authors":"Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00046","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00046","url":null,"abstract":"Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is very costly, which could largely affect the efficiency of DNN testing, even the whole process of DNN development. To relieve the labeling-cost problem, we propose a novel test input prioritization approach (called PRIMA) for DNNs via intelligent mutation analysis in order to label more bug-revealing test inputs earlier for a limited time, which facilitates to improve the efficiency of DNN testing. PRIMA is based on the key insight: a test input that is able to kill many mutated models and produce different prediction results with many mutated inputs, is more likely to reveal DNN bugs, and thus it should be prioritized higher. After obtaining a number of mutation results from a series of our designed model and input mutation rules for each test input, PRIMA further incorporates learning-to-rank (a kind of supervised machine learning to solve ranking problems) to intelligently combine these mutation results for effective test input prioritization. We conducted an extensive study based on 36 popular subjects by carefully considering their diversity from five dimensions (i.e., different domains of test inputs, different DNN tasks, different network structures, different types of test inputs, and different training scenarios). Our experimental results demonstrate the effectiveness of PRIMA, significantly outperforming the state-of-the-art approaches (with the average improvement of 8.50%~131.01% in terms of prioritization effectiveness). In particular, we have applied PRIMA to the practical autonomous-vehicle testing in a large motor company, and the results on 4 real-world scene-recognition models in autonomous vehicles further confirm the practicability of PRIMA.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Playing Planning Poker in Crowds: Human Computation of Software Effort Estimates","authors":"Mohammed Alhamed, Tim Storer","doi":"10.1109/ICSE43902.2021.00014","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00014","url":null,"abstract":"Reliable cost effective effort estimation remains a considerable challenge for software projects. Recent work has demonstrated that the popular Planning Poker practice can produce reliable estimates when undertaken within a software team of knowledgeable domain experts. However, the process depends on the availability of experts and can be time-consuming to perform, making it impractical for large scale or open source projects that may curate many thousands of outstanding tasks. This paper reports on a full study to investigate the feasibility of using crowd workers supplied with limited information about a task to provide comparably accurate estimates using Planning Poker. We describe the design of a Crowd Planning Poker (CPP) process implemented on Amazon Mechanical Turk and the results of a substantial set of trials, involving more than 5000 crowd workers and 39 diverse software tasks. Our results show that a carefully organised and selected crowd of workers can produce effort estimates that are of similar accuracy to those of a single expert.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132037494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}