M. Agbese, Rahul Mohanani, A. Khan, P. Abrahamsson
{"title":"Implementing AI Ethics: Making Sense of the Ethical Requirements","authors":"M. Agbese, Rahul Mohanani, A. Khan, P. Abrahamsson","doi":"10.1145/3593434.3593453","DOIUrl":"https://doi.org/10.1145/3593434.3593453","url":null,"abstract":"Society’s increasing dependence on Artificial Intelligence (AI) and AI-enabled systems require a more practical approach from software engineering (SE) executives in middle and higher-level management to improve their involvement in implementing AI ethics by making ethical requirements part of their management practices. However, research indicates that most work on implementing ethical requirements in SE management primarily focuses on technical development, with scarce findings for middle and higher-level management. We investigate this by interviewing ten Finnish SE executives in middle and higher-level management to examine how they consider and implement ethical requirements. We use ethical requirements from the European Union (EU) Trustworthy Ethics guidelines for Trustworthy AI as our reference for ethical requirements and an Agile portfolio management framework to analyze implementation. Our findings reveal a general consideration of privacy and data governance ethical requirements as legal requirements with no other consideration for ethical requirements identified. The findings also show practicable consideration of ethical requirements as technical robustness and safety for implementation as risk requirements and societal and environmental well-being for implementation as sustainability requirements. We examine a practical approach to implementing ethical requirements using the ethical risk requirements stack employing the Agile portfolio management framework.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing Maintenance Activities of Software Libraries","authors":"Alexandros Tsakpinis","doi":"10.1145/3593434.3593474","DOIUrl":"https://doi.org/10.1145/3593434.3593474","url":null,"abstract":"Industrial applications heavily integrate open-source software libraries nowadays. Beyond the benefits that libraries bring, they can also impose a real threat in case a library is affected by a vulnerability but its community is not active in creating a fixing release. Therefore, I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities. Since most research in this field is limited due to lack of features, labels, and transitive links, and thus is not applicable in industry, my approach aims to close this gap by capturing the impact of direct and transitive dependencies in terms of their maintenance activities. Automatically monitoring the maintenance activities of dependencies reduces the manual effort of application maintainers and supports application security by continuously having well-maintained dependencies.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126322461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vita Santa Barletta, D. Caivano, Domenico Gigante, A. Ragone
{"title":"A Rapid Review of Responsible AI frameworks: How to guide the development of ethical AI","authors":"Vita Santa Barletta, D. Caivano, Domenico Gigante, A. Ragone","doi":"10.1145/3593434.3593478","DOIUrl":"https://doi.org/10.1145/3593434.3593478","url":null,"abstract":"In the last years, the raise of Artificial Intelligence (AI), and its pervasiveness in our lives, has sparked a flourishing debate about the ethical principles that should lead its implementation and use in society. Driven by these concerns, we conduct a rapid review of several frameworks providing principles, guidelines, and/or tools to help practitioners in the development and deployment of Responsible AI (RAI) applications. We map each framework w.r.t. the different Software Development Life Cycle (SDLC) phases discovering that most of these frameworks fall just in the Requirements Elicitation phase, leaving the other phases uncovered. Very few of these frameworks offer supporting tools for practitioners, and they are mainly provided by private companies. Our results reveal that there is not a \"catching-all\" framework supporting both technical and non-technical stakeholders in the implementation of real-world projects. Our findings highlight the lack of a comprehensive framework encompassing all RAI principles and all (SDLC) phases that could be navigated by users with different skill sets and with different goals.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126950885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Reporting of Threats to Construct Validity","authors":"Dag I.K. Sjøberg, Gunnar R. Bergersen","doi":"10.1145/3593434.3593449","DOIUrl":"https://doi.org/10.1145/3593434.3593449","url":null,"abstract":"Background: Construct validity concerns the use of indicators to measure a concept that is not directly measurable. Aim: This study intends to identify, categorize, assess and quantify discussions of threats to construct validity in empirical software engineering literature and use the findings to suggest ways to improve the reporting of construct validity issues. Method: We analyzed 83 articles that report human-centric experiments published in five top-tier software engineering journals from 2015 to 2019. The articles’ text concerning threats to construct validity was divided into segments (the unit of analysis) based on predefined categories. The segments were then evaluated regarding whether they clearly discussed a threat and a construct. Results: Three-fifths of the segments were associated with topics not related to construct validity. Two-thirds of the articles discussed construct validity without using the definition of construct validity given in the article. The threats were clearly described in more than four-fifths of the segments, but the construct in question was clearly described in only two-thirds of the segments. The construct was unclear when the discussion was not related to construct validity but to other types of validity. Conclusions: The results show potential for improving the understanding of construct validity in software engineering. Recommendations addressing the identified weaknesses are given to improve the awareness and reporting of CV.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133562247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systematic Literature Review on Client Selection in Federated Learning","authors":"Carl Smestad, Jingyue Li","doi":"10.1145/3593434.3593438","DOIUrl":"https://doi.org/10.1145/3593434.3593438","url":null,"abstract":"With the arising concerns of privacy within machine learning, federated learning (FL) was invented in 2017, in which the clients, such as mobile devices, train a model and send the update to the centralized server. Choosing clients randomly for FL can harm learning performance due to different reasons. Many studies have proposed approaches to address the challenges of client selection of FL. However, no systematic literature review (SLR) on this topic existed. This SLR investigates the state of the art of client selection in FL and answers the challenges, solutions, and metrics to evaluate the solutions. We systematically reviewed 47 primary studies. The main challenges found in client selection are heterogeneity, resource allocation, communication costs, and fairness. The client selection schemes aim to improve the original random selection algorithm by focusing on one or several of the aforementioned challenges. The most common metric used is testing accuracy versus communication rounds, as testing accuracy measures the successfulness of the learning and preferably in as few communication rounds as possible, as they are very expensive. Although several possible improvements can be made with the current state of client selection, the most beneficial ones are evaluating the impact of unsuccessful clients and gaining a more theoretical understanding of the impact of fairness in FL.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126584457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HornFuzz: Fuzzing CHC solvers","authors":"Anzhela Sukhanova, Valentyn Sobol","doi":"10.1145/3593434.3593455","DOIUrl":"https://doi.org/10.1145/3593434.3593455","url":null,"abstract":"Many advanced program analysis and verification methods are based on solving systems of Constrained Horn Clauses (CHC). Testing CHC solvers is very important, as correctness of their work determines whether bugs in the analyzed programs are detected or missed. One of the well-established and efficient methods of automated software testing is fuzzing: analyzing the reactions of programs to random input data. Currently, there are no fuzzers for CHC solvers, and fuzzers for SMT solvers are not efficient in CHC solver testing, since they do not consider CHC specifics. In this paper, we present HornFuzz, a mutation-based gray-box fuzzing technique for detecting bugs in CHC solvers based on the idea of metamorphic testing. We evaluated our fuzzer on one of the highest performing CHC solvers, Spacer, and found a handful of bugs in Spacer. In particular, some discovered problems are so serious that they require fixes with significant changes to the solver.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Gaspar-Figueiredo, S. Abrahão, E. Insfrán, J. Vanderdonckt
{"title":"Measuring User Experience of Adaptive User Interfaces using EEG: A Replication Study","authors":"Daniel Gaspar-Figueiredo, S. Abrahão, E. Insfrán, J. Vanderdonckt","doi":"10.1145/3593434.3593452","DOIUrl":"https://doi.org/10.1145/3593434.3593452","url":null,"abstract":"Background: Adaptive user interfaces have the advantage of being able to dynamically change their aspect and/or behaviour depending on the characteristics of the context of use, i.e. to improve user experience. User experience is an important quality factor that has been primarily evaluated with classical measures (e.g. effectiveness, efficiency, satisfaction), but to a lesser extent with physiological measures, such as emotion recognition, skin response, or brain activity. Aim: In a previous exploratory experiment involving users with different profiles and a wide range of ages, we analysed user experience in terms of cognitive load, engagement, attraction and memorisation when employing twenty graphical adaptive menus through the use of an Electroencephalogram (EEG) device. The results indicated that there were statistically significant differences for these four variables. However, we considered that it was necessary to confirm or reject these findings using a more homogeneous group of users. Method: We conducted an operational internal replication study with 40 participants. We also investigated the potential correlation between EEG signals and the participants’ user experience ratings, such as their preferences. Results: The results of this experiment confirm that there are statistically significant differences between the EEG variables when the participants interact with the different adaptive menus. Moreover, there is a high correlation among the participants’ user experience ratings and the EEG signals, and a trend regarding performance has emerged from our analysis. Conclusions: These findings suggest that EEG signals could be used to evaluate user experience. With regard to the menus studied, our results suggest that graphical menus with different structures and font types produce more differences in users’ brain responses, while menus which use colours produce more similarities in users’ brain responses. Several insights with which to improve users’ experience of graphical adaptive menus are outlined.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121069694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minshun Yang, Seiji Sato, H. Washizaki, Y. Fukazawa, Juichi Takahashi
{"title":"Identifying Characteristics of the Agile Development Process That Impact User Satisfaction","authors":"Minshun Yang, Seiji Sato, H. Washizaki, Y. Fukazawa, Juichi Takahashi","doi":"10.1145/3593434.3593470","DOIUrl":"https://doi.org/10.1145/3593434.3593470","url":null,"abstract":"The purpose of this study is to identify the characteristics of Agile development processes that impact user satisfaction. We used user reviews of OSS smartphone apps and various data from version control systems to examine the relationships, especially time-series correlations, between user satisfaction and development metrics that are expected to be related to user satisfaction. Although no metrics conclusively indicate an improved user satisfaction, motivation of the development team, the ability to set appropriate work units, the appropriateness of work rules, and the improvement of code maintainability should be considered as they are correlated with improved user satisfaction. In contrast, changes in the release frequency and workload are not correlated.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131660263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bastin Tony Roy Savarimuthu, Zoofishan Zareen, Jithin Cheriyan, Muhammad Yasir, M. Galster
{"title":"Barriers for Social Inclusion in Online Software Engineering Communities - A Study of Offensive Language Use in Gitter Projects","authors":"Bastin Tony Roy Savarimuthu, Zoofishan Zareen, Jithin Cheriyan, Muhammad Yasir, M. Galster","doi":"10.1145/3593434.3593463","DOIUrl":"https://doi.org/10.1145/3593434.3593463","url":null,"abstract":"Social inclusion is a fundamental feature of thriving societies. This paper first investigates barriers for social inclusion in online Software Engineering (SE) communities, by identifying a set of 11 attributes and organising them as a taxonomy. Second, by applying the taxonomy and analysing language used in the comments posted by members in 189 Gitter projects (with > 3 million comments), it presents the evidence for the social exclusion problem. It employs a keyword-based search approach for this purpose. Third, it presents a framework for improving social inclusion in SE communities.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127503930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Masudur Rahman, T. Ahammed, Md. Mahbubul Alam Joarder, K. Sakib
{"title":"Does Code Smell Frequency Have a Relationship with Fault-proneness?","authors":"Md Masudur Rahman, T. Ahammed, Md. Mahbubul Alam Joarder, K. Sakib","doi":"10.1145/3593434.3593457","DOIUrl":"https://doi.org/10.1145/3593434.3593457","url":null,"abstract":"Fault-proneness is an indication of programming errors that decreases software quality and maintainability. On the contrary, code smell is a symptom of potential design problems which has impact on fault-proneness. In the literature, negative impact of code smells on fault-proneness has been investigated. However, it is still unclear that how frequency of each code smell type impacts the fault-proneness. To mitigate this research gap, we present an empirical study to identify whether frequency of individual code smell types has a relationship with the fault-proneness. The results show that Anti Singleton, Blob and Class Data Should Be Private smell types have strong relationship with fault-proneness though their frequencies are not very high. On the other hand, comparatively high frequent code smell types such as Complex Class, Large Class and Long Parameter List have moderate relationship with fault-proneness. These findings will assist developers to prioritize and refactor code smells to improve software quality.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134532511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}