Miller Trujillo, M. Linares-Vásquez, Camilo Escobar-Velásquez, Ivana Dusparic, Nicolás Cardozo
{"title":"Does Neuron Coverage Matter for Deep Reinforcement Learning?: A Preliminary Study","authors":"Miller Trujillo, M. Linares-Vásquez, Camilo Escobar-Velásquez, Ivana Dusparic, Nicolás Cardozo","doi":"10.1145/3387940.3391462","DOIUrl":"https://doi.org/10.1145/3387940.3391462","url":null,"abstract":"Deep Learning (DL) is powerful family of algorithms used for a wide variety of problems and systems, including safety critical systems. As a consequence, analyzing, understanding, and testing DL models is attracting more practitioners and researchers with the purpose of implementing DL systems that are robust, reliable, efficient, and accurate. First software testing approaches for DL systems have focused on black-box testing, white-box testing, and test cases generation, in particular for deep neural networks (CNNs and RNNs). However, Deep Reinforcement Learning (DRL), which is a branch of DL extending reinforcement learning, is still out of the scope of research providing testing techniques for DL systems. In this paper, we present a first step towards testing of DRL systems. In particular, we investigate whether neuron coverage (a widely used metric for white-box testing of DNNs) could be used also for DRL systems, by analyzing coverage evolutionary patterns, and the correlation with RL rewards.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131612657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metamorphic Robustness Testing of Google Translate","authors":"Dickson T. S. Lee, Z. Zhou, T. H. Tse","doi":"10.1145/3387940.3391484","DOIUrl":"https://doi.org/10.1145/3387940.3391484","url":null,"abstract":"Current research on the testing of machine translation software mainly focuses on functional correctness for valid, well-formed inputs. By contrast, robustness testing, which involves the ability of the software to handle erroneous or unanticipated inputs, is often overlooked. In this paper, we propose to address this important shortcoming. Using the metamorphic robustness testing approach, we compare the translations of original inputs with those of follow-up inputs having different categories of minor typos. Our empirical results reveal a lack of robustness in Google Translate, thereby opening a new research direction for the quality assurance of neural machine translators.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"341 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134044997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sorry to Bother You Again: Developer Recommendation Choice Architectures for Designing Effective Bots","authors":"Chris Brown, Chris Parnin","doi":"10.1145/3387940.3391506","DOIUrl":"https://doi.org/10.1145/3387940.3391506","url":null,"abstract":"Software robots, or bots, are useful for automating a wide variety of programming and software development tasks. Despite the advantages of using bots throughout the software engineering process, research shows that developers often face challenges interacting with these systems. To improve automated developer recommendations from bots, this work introduces developer recommendation choice architectures. Choice architecture is a behavioral science concept that suggests the presentation of options impacts the decisions humans make. To evaluate the impact of framing recommendations for software engineers, we examine the impact of one choice architecture, actionability, for improving the design of bot recommendations. We present the results of a preliminary study evaluating this choice architecture in a bot and provide implications for integrating choice architecture into the design of future software engineering bots.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131732763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Improvement of Machine Translation Using Mutamorphic Relation: Invited Talk Paper","authors":"Jie M. Zhang","doi":"10.1145/3387940.3391541","DOIUrl":"https://doi.org/10.1145/3387940.3391541","url":null,"abstract":"This paper introduces Mutamorphic Relation for Machine Learning Testing. Mutamorphic Relation combines data mutation and metamorphic relations as test oracles for machine learning systems. These oracles can help achieve fully automatic testing as well as automatic repair of the machine learning models. The paper takes TransRepair as an example to show the effectiveness of Mutamorphic Relation in automatically testing and improving machine translators, TransRepair detects inconsistency bugs without access to human oracles. It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsistencies. Manual inspection indicates that the translations repaired by TransRepair improve consistency in 87% of cases (degrading it in 2%), and that the repairs of have better translation acceptability in 27% of the cases (worse in 8%).","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialogue Act Classification for Virtual Agents for Software Engineers during Debugging","authors":"Andrew Wood, Zachary Eberhart, Collin McMillan","doi":"10.1145/3387940.3391487","DOIUrl":"https://doi.org/10.1145/3387940.3391487","url":null,"abstract":"A \"dialogue act\" is a written or spoken action during a conversation. Dialogue acts are usually only a few words long, and are often categorized by researchers into a relatively small set of dialogue act types, such as eliciting information, expressing an opinion, or making a greeting. Research interest into automatic classification of dialogue acts has grown recently due to the proliferation of Virtual Agents (VA) e.g. Siri, Cortana, Alexa. But unfortunately, the gains made into VA development in one domain are generally not applicable to other domains, since the composition of dialogue acts differs in different conversations. In this paper, we target the problem of dialogue act classification for a VA for software engineers repairing bugs. A problem in the SE domain is that very little sample data exists - the only public dataset is a recently-released Wizard of Oz study with 30 conversations. Therefore, we present a transfer-learning technique to learn on a much larger dataset for general business conversations, and apply the knowledge to the SE dataset. In an experiment, we observe between 8% and 20% improvement over two key baselines.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"More than Code: Contributions in Scrum Software Engineering Teams","authors":"Frederike Ramin, Christoph Matthies, Ralf Teusner","doi":"10.1145/3387940.3392241","DOIUrl":"https://doi.org/10.1145/3387940.3392241","url":null,"abstract":"Motivated and competent team members are a vital part of Agile Software development and make or break any project's success. Motivation is fostered by continuous progress and recognition of efforts. These concepts are founding pillars of the Scrum methodology, which focuses on self-organizing teams. The types of contributions Scrum development team members make to a project's progress are not only technical. However, a comprehensive model comprising the varied contributions in modern software engineering teams is not yet established. We propose a model that incorporates contributions of all Scrum roles, explicitly including those which are not directly related to project artifacts. It improves the visibility of performed tasks, acts as a starting point for team retrospection, and serves as a foundation for discussion in the research community.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122196951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Twin for Cybersecurity Incident Prediction: A Multivocal Literature Review","authors":"Abhishek Pokhrel, Vikash Katta, Ricardo Colomo Palacios","doi":"10.1145/3387940.3392199","DOIUrl":"https://doi.org/10.1145/3387940.3392199","url":null,"abstract":"The advancements in the field of internet of things, artificial intelligence, machine learning, and data analytics has laid the path to the evolution of digital twin technology. The digital twin is a high-fidelity digital model of a physical system or asset that can be used e.g. to optimize operations and predict faults of the physical system. To understand different use cases of digital twin and its potential for cybersecurity incident prediction, we have performed a Systematic Literature Review (SLR). In this paper, we summarize the definition of digital twin and state-of-the-art on the development of digital twin including reported work on the usability of a digital twin for cybersecurity. Existing tools and technologies for developing digital twin is discussed.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127670670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Klünder, Nils Prenner, Ann-Kathrin Windmann, Marek Stess, Michael Nolting, Fabian Kortum, Lisa Handke, K. Schneider, S. Kauffeld
{"title":"Do You Just Discuss or Do You Solve?: Meeting Analysis in a Software Project at Early Stages","authors":"J. Klünder, Nils Prenner, Ann-Kathrin Windmann, Marek Stess, Michael Nolting, Fabian Kortum, Lisa Handke, K. Schneider, S. Kauffeld","doi":"10.1145/3387940.3391468","DOIUrl":"https://doi.org/10.1145/3387940.3391468","url":null,"abstract":"Software development is a very cooperative and communicative task. In most software projects, meetings are a very important medium to share information. However, these meetings are often not as effective as expected. One big issue hindering productive and satisfying meetings is inappropriate behavior such as complaining. In particular, talking about problems without at least trying to solve them decreases motivation and mood of the team. Interaction analyses in meetings allow the assessment of appropriate and inappropriate behavior influencing the quality of a meeting. Derived from an established interaction analysis coding scheme in psychology, we present act4teams-short which allows real-time coding of meetings in software projects. We apply act4teams-short in an industrial case study at Volkswagen Commercial Vehicles, a large German company in the automotive domain. We analyze ten team-internal meetings at early project stages. Our results reveal difficulties due to missing project structure and the overall project goal. Furthermore, the team has an intrinsic interest in identifying problems and solving them, without any extrinsic input being required.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124602116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Code Recommendations by Combining Neural and Classical Machine Learning Approaches","authors":"M. Schumacher, K. T. Le, A. Andrzejak","doi":"10.1145/3387940.3391489","DOIUrl":"https://doi.org/10.1145/3387940.3391489","url":null,"abstract":"Code recommendation systems for software engineering are designed to accelerate the development of large software projects. A classical example is code completion or next token prediction offered by modern integrated development environments. A particular challenging case for such systems are dynamic languages like Python due to limited type information at editing time. Recently, researchers proposed machine learning approaches to address this challenge. In particular, the Probabilistic Higher Order Grammar technique (Bielik et al., ICML 2016) uses a grammar-based approach with a classical machine learning schema to exploit local context. A method by Li et al., (IJCAI 2018) uses deep learning methods, in detail a Recurrent Neural Network coupled with a Pointer Network. We compare these two approaches quantitatively on a large corpus of Python files from GitHub. We also propose a combination of both approaches, where a neural network decides which schema to use for each prediction. The proposed method achieves a slightly better accuracy than either of the systems alone. This demonstrates the potential of ensemble-like methods for code completion and recommendation tasks in dynamically typed languages.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Implicit User Feedback from Multisensorial and Physiological Data: A case study","authors":"Franci Suni Lopez, Nelly Condori-Fernández, Alejandro Catalá","doi":"10.1145/3387940.3391466","DOIUrl":"https://doi.org/10.1145/3387940.3391466","url":null,"abstract":"Ensuring the quality of user experience is very important for increasing the acceptance likelihood of software applications, which can be affected by several contextual factors that continuously change over time (e.g., emotional state of end-user). Due to these changes in the context, software continually needs to adapt for delivering software services that can satisfy user needs. However, to achieve this adaptation, it is important to gather and understand the user feedback. In this paper, we mainly investigate whether physiological data can be considered and used as a form of implicit user feedback. To this end, we conducted a case study involving a tourist traveling abroad, who used a wearable device for monitoring his physiological data, and a smartphone with a mobile app for reminding him to take his medication on time during four days. Through the case study, we were able to identify some factors and activities as emotional triggers, which were used for understanding the user context. Our results highlight the importance of having a context analyzer, which can help the system to determine whether the detected stress could be considered as actionable and consequently as implicit user feedback.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128658475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}