2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第4页

What Makes a Great Maintainer of Open Source Projects? 怎样才能成为开源项目的优秀维护者?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00093

E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto

{"title":"What Makes a Great Maintainer of Open Source Projects?","authors":"E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto","doi":"10.1109/ICSE43902.2021.00093","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00093","url":null,"abstract":"Although Open Source Software (OSS) maintainers devote a significant proportion of their work to coding tasks, great maintainers must excel in many other activities beyond coding. Maintainers should care about fostering a community, helping new members to find their place, while also saying \"no\" to patches that although are well-coded and well-tested, do not contribute to the goal of the project. To perform all these activities masterfully, maintainers should exercise attributes that software engineers (working on closed source projects) do not always need to master. This paper aims to uncover, relate, and prioritize the unique attributes that great OSS maintainers might have. To achieve this goal, we conducted 33 semi-structured interviews with well-experienced maintainers that are the gatekeepers of notable projects such as the Linux Kernel, the Debian operating system, and the GitLab coding platform. After we analyzed the interviews and curated a list of attributes, we created a conceptual framework to explain how these attributes are connected. We then conducted a rating survey with 90 OSS contributors. We noted that \"technical excellence\" and \"communication\" are the most recurring attributes. When grouped, these attributes fit into four broad categories: management, social, technical, and personality. While we noted that \"sustain a long term vision of the project\" and being \"extremely careful\" seem to form the basis of our framework, we noted through our survey that the communication attribute was perceived as the most essential one.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation 基于概率标签估计的半监督日志异常检测

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00130

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang

{"title":"Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation","authors":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00130","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00130","url":null,"abstract":"With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 71

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems 机器学习系统中重构和技术债务的实证研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00033

Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja

{"title":"An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems","authors":"Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja","doi":"10.1109/ICSE43902.2021.00033","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00033","url":null,"abstract":"Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129527493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

AID: An Automated Detector for Gender-Inclusivity Bugs in OSS Project Pages AID: OSS项目页面中性别包容性错误的自动检测器

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00128

Amreeta Chatterjee, M. Guizani, Catherine Stevens, Jillian Emard, Mary Evelyn May, M. Burnett, Iftekhar Ahmed, A. Sarma

引用次数: 9

Resource-Guided Configuration Space Reduction for Deep Learning Models 深度学习模型的资源导向配置空间约简

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00028

Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang

{"title":"Resource-Guided Configuration Space Reduction for Deep Learning Models","authors":"Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang","doi":"10.1109/ICSE43902.2021.00028","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00028","url":null,"abstract":"Deep learning models, like traditional software systems, provide a large number of configuration options. A deep learning model can be configured with different hyperparameters and neural architectures. Recently, AutoML (Automated Machine Learning) has been widely adopted to automate model training by systematically exploring diverse configurations. However, current AutoML approaches do not take into consideration the computational constraints imposed by various resources such as available memory, computing power of devices, or execution time. The training with non-conforming configurations could lead to many failed AutoML trial jobs or inappropriate models, which cause significant resource waste and severely slow down development productivity. In this paper, we propose DnnSAT, a resource-guided AutoML approach for deep learning models to help existing AutoML tools efficiently reduce the configuration space ahead of time. DnnSAT can speed up the search process and achieve equal or even better model learning performance because it excludes trial jobs not satisfying the constraints and saves resources for more trials. We formulate the resource-guided configuration space reduction as a constraint satisfaction problem. DnnSAT includes a unified analytic cost model to construct common constraints with respect to the model weight size, number of floating-point operations, model inference time, and GPU memory consumption. It then utilizes an SMT solver to obtain the satisfiable configurations of hyperparameters and neural architectures. Our evaluation results demonstrate the effectiveness of DnnSAT in accelerating state-of-the-art AutoML methods (Hyperparameter Optimization and Neural Architecture Search) with an average speedup from 1.19X to 3.95X on public benchmarks. We believe that DnnSAT can make AutoML more practical in a real-world environment with constrained resources.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

How Developers Optimize Virtual Reality Applications: A Study of Optimization Commits in Open Source Unity Projects 开发者如何优化虚拟现实应用:开源统一项目的优化提交研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00052

Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang

{"title":"How Developers Optimize Virtual Reality Applications: A Study of Optimization Commits in Open Source Unity Projects","authors":"Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang","doi":"10.1109/ICSE43902.2021.00052","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00052","url":null,"abstract":"Virtual Reality (VR) is an emerging technique that provides immersive experience for users. Due to the high computation cost of rendering real-time animation twice (for both eyes) and the resource limitation of wearable devices, VR applications often face performance bottlenecks and performanceoptimization plays an important role in VR software develop-ment. Performance optimizations of VR applications can be very different from those in traditional software as VR involves more elements such as graphics rendering and real-time animation. In this paper, we present the first empirical study on 183 real-world performance optimizations from 45 VR software projects. In particular, we manually categorized the optimizations in to 11 categories, and applied static analysis to identify how they affect different life-cycle phases of VR applications. Furthermore, we studied the complexity and design / behavior effects of performance optimizations, and how optimizations are different between large organizational software projects and smaller personal software projects. Our major findings include: (1) graphics simplification (24.0%), rendering optimization (16.9%), language / API optimization (15.3%), heap avoidance (14.8%), and valuecaching (12.0%) are the most common categories of performance optimization in VR applications; (2) game logic updates (30.4%) and before-scene initialization (20.0%) are the most common life-cycle phases affected by performance issues; (3) 45.9% of the optimizations have behavior and design effects and 39.3% of the optimizations are systematic changes; (4) the distributionsof optimization classes are very different between organizational VR projects and personal VR projects.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Reducing DNN Properties to Enable Falsification with Adversarial Attacks 减少DNN属性以实现对抗性攻击的伪造

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00036

David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer

引用次数: 19

Program Comprehension and Code Complexity Metrics: An fMRI Study 程序理解和代码复杂性度量:一项功能磁共振成像研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00056

Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund

{"title":"Program Comprehension and Code Complexity Metrics: An fMRI Study","authors":"Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund","doi":"10.1109/ICSE43902.2021.00056","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00056","url":null,"abstract":"Background: Researchers and practitioners have been using code complexity metrics for decades to predict how developers comprehend a program. While it is plausible and tempting to use code metrics for this purpose, their validity is debated, since they rely on simple code properties and rarely consider particularities of human cognition. Aims: We investigate whether and how code complexity metrics reflect difficulty of program comprehension. Method: We have conducted a functional magnetic resonance imaging (fMRI) study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension, overall analyzing more than 41 metrics. Results: While our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension, fMRI allowed us to gain insights into why some code properties are difficult to process. In particular, a code's textual size drives programmers' attention, and vocabulary size burdens programmers' working memory. Conclusion: Our results provide neuro-scientific evidence supporting warnings of prior research questioning the validity of code complexity metrics and pin down factors relevant to program comprehension. Future Work: We outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116408590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis 基于突变分析的深度神经网络测试输入的优先排序

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00046

Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang

{"title":"Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis","authors":"Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00046","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00046","url":null,"abstract":"Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is very costly, which could largely affect the efficiency of DNN testing, even the whole process of DNN development. To relieve the labeling-cost problem, we propose a novel test input prioritization approach (called PRIMA) for DNNs via intelligent mutation analysis in order to label more bug-revealing test inputs earlier for a limited time, which facilitates to improve the efficiency of DNN testing. PRIMA is based on the key insight: a test input that is able to kill many mutated models and produce different prediction results with many mutated inputs, is more likely to reveal DNN bugs, and thus it should be prioritized higher. After obtaining a number of mutation results from a series of our designed model and input mutation rules for each test input, PRIMA further incorporates learning-to-rank (a kind of supervised machine learning to solve ranking problems) to intelligently combine these mutation results for effective test input prioritization. We conducted an extensive study based on 36 popular subjects by carefully considering their diversity from five dimensions (i.e., different domains of test inputs, different DNN tasks, different network structures, different types of test inputs, and different training scenarios). Our experimental results demonstrate the effectiveness of PRIMA, significantly outperforming the state-of-the-art approaches (with the average improvement of 8.50%~131.01% in terms of prioritization effectiveness). In particular, we have applied PRIMA to the practical autonomous-vehicle testing in a large motor company, and the results on 4 real-world scene-recognition models in autonomous vehicles further confirm the practicability of PRIMA.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Playing Planning Poker in Crowds: Human Computation of Software Effort Estimates 在人群中玩计划扑克:软件工作量估算的人类计算

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00014

Mohammed Alhamed, Tim Storer

引用次数: 5