Sebastian Nielebock, R. Heumüller, J. Krüger, F. Ortmeier
{"title":"Using API-Embedding for API-Misuse Repair","authors":"Sebastian Nielebock, R. Heumüller, J. Krüger, F. Ortmeier","doi":"10.1145/3387940.3392171","DOIUrl":"https://doi.org/10.1145/3387940.3392171","url":null,"abstract":"Application Programming Interfaces (APIs) are a way to reuse existing functionalities of one application in another one. However, due to lacking knowledge on the correct usage of a particular API, developers sometimes commit misuses, causing unintended or faulty behavior. To detect and eventually repair such misuses automatically, inferring API usage patterns from real-world code is the state-of-the-art. A contradiction to an identified usage pattern denotes a misuse, while applying the pattern fixes the respective misuse. The success of this process heavily depends on the quality of the usage patterns and on the code from which these are inferred. Thus, a lack of code demonstrating the correct usage makes it impossible to detect and fix a misuse. In this paper, we discuss the potential of using machine-learning vector embeddings to improve automatic program repair and to extend it towards cross-API and cross-language repair. We illustrate our ideas using one particular technique for API-embedding (i.e., API2Vec) and describe the arising possibilities and challenges.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126967720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Matthies, Franziska Dobrigkeit, Guenter Hesse
{"title":"Mining for Process Improvements: Analyzing Software Repositories in Agile Retrospectives","authors":"Christoph Matthies, Franziska Dobrigkeit, Guenter Hesse","doi":"10.1145/3387940.3392168","DOIUrl":"https://doi.org/10.1145/3387940.3392168","url":null,"abstract":"Software Repositories contain knowledge on how software engineering teams work, communicate, and collaborate. It can be used to develop a data-informed view of a team's development process, which in turn can be employed for process improvement initiatives. In modern, Agile development methods, process improvement takes place in Retrospective meetings, in which the last development iteration is discussed. However, previously proposed activities that take place in these meetings often do not rely on project data, instead depending solely on the perceptions of team members. We propose new Retrospective activities, based on mining the software repositories of individual teams, to complement existing approaches with more objective, data-informed process views.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128772979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. T. Kutas, Aditya Nair, Prerna Singh, E. Kan, J. Burge, A. van der Hoek
{"title":"Linecept","authors":"D. T. Kutas, Aditya Nair, Prerna Singh, E. Kan, J. Burge, A. van der Hoek","doi":"10.1145/3387940.3392228","DOIUrl":"https://doi.org/10.1145/3387940.3392228","url":null,"abstract":"In software design, the various stakeholders generate large numbers of heterogeneous artifacts. These artifacts are often developed in, and managed by, different tools. In this paper, we present our initial prototype of Linecept, a tool that helps stakeholders organize, find, and view disparate design artifacts by organizing them on a timeline that presents a single unified view of the artifacts and who created them. We have used Linecept to retrospectively capture design artifacts for its own creation and in a software design class.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124669284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Load Inducing Test Scenarios Using Genetic Algorithms and Tree Based Encoding","authors":"Ege Apak, Ayse Tosun Misirli","doi":"10.1145/3387940.3392216","DOIUrl":"https://doi.org/10.1145/3387940.3392216","url":null,"abstract":"Load test is conducted in order to gain an insight to the characteristics of a system under various amount of load. Since the combination of possible actions a user can follow from start to finish is possibly endless, the possibility of missing a load inducing scenario by using a traditional load testing software is highly probable. In this work, we implement a rule-aided scenario generation algorithm and find the possible scenarios that a high amount of load is generated by using genetic algorithms to drive the search forward.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129485060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Golzadeh, Damien Legay, Alexandre Decan, T. Mens
{"title":"Bot or not?: Detecting bots in GitHub pull request activity based on comment similarity","authors":"M. Golzadeh, Damien Legay, Alexandre Decan, T. Mens","doi":"10.1145/3387940.3391503","DOIUrl":"https://doi.org/10.1145/3387940.3391503","url":null,"abstract":"Many empirical studies focus on socio-technical activity in social coding platforms such as GitHub, for example to study the onboarding, abandonment, productivity and collaboration among team members. Such studies face the difficulty that GitHub activity can also be generated automatically by bots of a different nature. It therefore becomes imperative to distinguish such bots from human users. We propose an automated approach to detect bots in GitHub pull request (PR) activity. Relying on the assumption that bots contain repetitive message patterns in their PR comments, we analyse the similarity between multiple messages from the same GitHub identity, using a clustering method that combines the Jaccard and Levenshtein distance. We empirically evaluate our approach by analysing 20,090 PR comments of 250 users and 42 bots in 1,262 GitHub repositories. Our results show that the method is able to clearly separate bots from human users.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127633283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohit Mehra, V. Sharma, Vikrant S. Kaulgud, Sanjay Podder, Adam P. Burden
{"title":"Immersive IDE: Towards Leveraging Virtual Reality for creating an Immersive Software Development Environment","authors":"Rohit Mehra, V. Sharma, Vikrant S. Kaulgud, Sanjay Podder, Adam P. Burden","doi":"10.1145/3387940.3392234","DOIUrl":"https://doi.org/10.1145/3387940.3392234","url":null,"abstract":"Positive affects have been shown to be positively correlated with developer productivity. Moreover, the environment in which the developer sits is important as it has a direct bearing on her emotions. However, in typical software development project setups, it is difficult to mold the surroundings of a developer according to her own needs or wishes. Moreover, large project areas may tend to have multiple distractions that may further negatively affect the developer. Multiple studies have shown that Virtual Reality can be an effective medium to induce positive emotions, with the capability to immerse oneself into virtually created environs, allowing for countless opportunities to surround a developer with what she would want. In this paper, we present our approach to allow a developer to choose her own surrounding environment to work in (say a beach, a park or in space!), while allowing for a real-time feed of her workstation/tools to be embedded in the same for a seamless experience. We believe that this approach will enable better mood/engagement and lower distractions/stress, which may lead to higher productivity and a balanced sense of developer well-being.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129978691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrey Y. Shedko, Ilya Palachev, A. Kvochko, Aleksandr Semenov, Kwangwon Sun
{"title":"Applying probabilistic models to C++ code on an industrial scale","authors":"Andrey Y. Shedko, Ilya Palachev, A. Kvochko, Aleksandr Semenov, Kwangwon Sun","doi":"10.1145/3387940.3391477","DOIUrl":"https://doi.org/10.1145/3387940.3391477","url":null,"abstract":"Machine learning approaches are widely applied to different research tasks of software engineering, but C/C++ code presents a challenge for these approaches because of its complex build system. However, C and C++ languages still remain two of the most popular programming languages, especially in industrial software, where a big amount of legacy code is still used. This fact prevents the application of recent advances in probabilistic modeling of source code to the C/C++ domain. We demonstrate that it is possible to at least partially overcome these difficulties by the use of a simple token-based representation of C/C++ code that can be used as a possible replacement for more precise representations. Enriched token representation is verified at a large scale to ensure that its precision is good enough to learn rules from. We consider two different tasks as an application of this representation: coding style detection and API usage anomaly detection. We apply simple probabilistic models to these tasks and demonstrate that even complex coding style rules and API usage patterns can be detected by the means of this representation. This paper provides a vision of how different research ML-based methods for software engineering could be applied to the domain of C/C++ languages and show how they can be applied to the source code of a large software company like Samsung.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128942957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Property-based Testing of Quantum Programs in Q#","authors":"Shahin Honarvar, M. Mousavi, R. Nagarajan","doi":"10.1145/3387940.3391459","DOIUrl":"https://doi.org/10.1145/3387940.3391459","url":null,"abstract":"Property-based testing is a structured method for automated testing using program specifications. We report on the design and implementation of what is to our knowledge the first property-based framework for quantum programs. We review various aspects of our design concerning property-specification, test-case generation, and test result analysis. We also provide an overview of the implementation and its way of working. Finally, we present the result of applying our framework to some examples.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132583732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. L. Zanatta, L. Machado, Igor Steinmacher, R. Prikladnicki, C. D. Souza
{"title":"Strategies for Crowdworkers to Overcome Barriers in Competition-based Software Crowdsourcing Development","authors":"A. L. Zanatta, L. Machado, Igor Steinmacher, R. Prikladnicki, C. D. Souza","doi":"10.1145/3387940.3392243","DOIUrl":"https://doi.org/10.1145/3387940.3392243","url":null,"abstract":"Crowdsourcing in software development uses a large pool of developers on-demand to outsource parts or the entire software project to a crowd. To succeed, this requires a continuous influx of developers, or simply crowdworkers. However, crowdworkers face many barriers when attempting to participate in software crowdsourcing. Often, these barriers lead to a low number and poor quality of submitted solutions. In our previous work, we identified several barriers faced by crowdworkers including finding a task according to his/her abilities, setting up the environment to perform the task, and managing one's personal time. We also proposed six strategies to overcome or minimize these barriers. In this paper, these six strategies are evaluated questioning Software Crowdsourcing (SW CS) experts. The results show that software crowdsourcing needs to: (i) provide a system that helps matching tasks requirements and crowdworker's profile; (ii) adopt containers or virtual machines to help crowdworkers set up their environment to perform the task, (iii) plan and control crowdworkers' personal time, and (iv) adopt communication channels to allow crowdworkers to clarify questions about the requirements and, as a consequence, finish the tasks.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115070443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing outdateness with technical lag: an exploratory study","authors":"Jesus M. Gonzalez-Barahona","doi":"10.1145/3387940.3392202","DOIUrl":"https://doi.org/10.1145/3387940.3392202","url":null,"abstract":"Background: Nowadays, many applications are built reusing a large number of components, retrieved from software collections such as npm (JavaScript) or PyPi (Python). Those components are built in their corresponding upstream repositories, where they are being developed. This architecture of reusing causes some constraints on how much outdated is an application when it is deployed in production environments. Goal: To understand how outdateness of applications, and the components on which they depend, can be computed, so that different situations can be measured and assessed with the help of metrics. Based on this understanding, we also want to produce a model to characterize ecosystems (collections of reusable components). Method: Use the technical lag framework to analyze the flows from upstream repositories, to collection of components, to application building and later deployment. Using this framework, analyze lag in version availability in each of these stages, and constraints that set limits on how much outdated can be deployed applications. Results: We define a model which allows us to better understand the factors that influence outdateness of an application produced with reusable components from repositories of components. The model allows us to find the factors for defining metrics for measuring outdateness, and to explore the factors that influence outdateness for components in applications. We propose some of those factors as the basis to characterize ecosystems or collections of components with respect to their impact on the outdateness of applications built with them. Conclusions: Technical lag is an appropriate framework for studying lags in version propagation from upstream development to deployment.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117312861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}