Juan Pablo Sandoval Alcocer , Harold Camacho-Jaimes , Geraldine Galindo-Gutierrez , Andrés Neyem , Alexandre Bergel , Stéphane Ducasse
{"title":"On the use of statistical machine translation for suggesting variable names for decompiled code: The Pharo case","authors":"Juan Pablo Sandoval Alcocer , Harold Camacho-Jaimes , Geraldine Galindo-Gutierrez , Andrés Neyem , Alexandre Bergel , Stéphane Ducasse","doi":"10.1016/j.cola.2024.101271","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101271","url":null,"abstract":"<div><p>Adequately selecting variable names is a difficult activity for practitioners. In 2018, Jaffe et al. proposed the use of statistical machine translation (SMT) to suggest descriptive variable names for decompiled code. A large corpus of decompiled C code was used to train the SMT model. Our paper presents the results of a partial replication of Jaffe’s experiment. We apply the same technique and methodology to a dataset made of code written in the Pharo programming language. We selected Pharo since its syntax is simple – it fits on half of a postcard – and because the optimizations performed by the compiler are limited to method scope. Our results indicate that SMT may recover between 8.9% and 69.88% of the variable names depending on the training set. Our replication concludes that: (i) the accuracy depends on the code similarity between the training and testing sets; (ii) the simplicity of the Pharo syntax and the satisfactory decompiled code alignment have a positive impact on predicting variable names; and (iii) a relatively small code corpus is sufficient to train the SMT model, which shows the applicability of the approach to less popular programming languages. Additionally, to assess SMT’s potential in improving original variable names, ten Pharo developers reviewed 400 SMT name suggestions, with four reviews per variable. Only 15 suggestions (3.75%) were unanimously viewed as improvements, while 45 (11.25%) were perceived as improvements by at least two reviewers, highlighting SMT’s limitations in providing suitable alternatives.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101271"},"PeriodicalIF":2.2,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140643845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical approach to understand the role of emotions in code comprehension","authors":"Divjot Singh, Ashutosh Mishra, Ashutosh Aggarwal","doi":"10.1016/j.cola.2024.101269","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101269","url":null,"abstract":"<div><p>Programming and cognitive skills are two pivotal abilities of programmers to maintain software products. First, this study included a systematic literature review on code comprehension, emotions, cognitive psychology, and belief-desire-intention domains to analyse various code comprehension monitoring techniques, performance metrics, and computational methodologies. Second, a case study is conducted to examine the influence of various emotional stages on programmers’ programming and cognitive skills while comprehending the software code. The categorization of the participants is done empirically based on their expertism level, and the same results are verified using various machine learning models and performance metrics.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101269"},"PeriodicalIF":2.2,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140052825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Lauwaerts, Robbert Gurdeep Singh , Christophe Scholliers
{"title":"WARDuino: An embedded WebAssembly virtual machine","authors":"Tom Lauwaerts, Robbert Gurdeep Singh , Christophe Scholliers","doi":"10.1016/j.cola.2024.101268","DOIUrl":"10.1016/j.cola.2024.101268","url":null,"abstract":"<div><p>Creating IoT programs for resource-constrained microcontrollers differs significantly from conventional computer programming. Microcontrollers are traditionally programmed using low-level programming languages with poor debugging facilities. By contrast, general-purpose systems can be programmed with high-level languages, which make programming easier by providing many useful tools such as advanced debuggers, strong type systems, and/or automatic memory management. Most existing solutions for programming microcontrollers with high-level languages are strongly tied to a specific microcontroller architecture, which makes porting code difficult or impossible. In addition, compiling and flashing software onto a microcontroller is time-consuming, slowing down development.</p><p>To solve these problems we present WARDuino, a WebAssembly virtual machine that runs on microcontrollers and provides WebAssembly primitives to control embedded hardware and IoT functionality. WARDuino runs programs written in a plethora of high-level languages that compile to WebAssembly. We give a general approach for language integration libraries to expose the peripherals and networking capabilities of the device following the idioms of the host language.</p><p>To ease development, we extend WebAssembly with support for remote debugging and over-the-air reprogramming. WARDuino can remotely instruct a microcontroller to pause, to step, or to dump its state, and to replace local variables, functions or even the entire running program. We use the remote debugger of the virtual machine to create a visual debugging environment in VS Code for WARDuino, that can debug WebAssembly and AssemblyScript. Aside from these important tools, we provide a novel mechanism to handle asynchronous interrupts in WebAssembly, a fundamental building block for responsive embedded applications. Our extensions are implemented in the WARDuino virtual machine and presented as formal extensions to the WebAssembly operational semantics. We use the formalization to proof observational equivalence for the core debugger semantics.</p><p>We compared the computational performance and memory size with native C code, Espruino, and WASM3 which compiles WebAssembly ahead-of-time. The comparison shows that WARDuino’s performance is acceptable. Although WARDuino is on average 425.93 times slower than native code and 37.96 times slower than WASM3, it outperforms the popular Espruino runtime by a factor of 11.66. Additionally, we show that WARDuino is fast enough to program traditional IoT applications that handle network and device interrupts with a classic smart lamp application written in AssemblyScript.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101268"},"PeriodicalIF":2.2,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139891428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abim Sedhain , Vaishvi Diwanji , Helen Solomon , Shahnewaz Leon , Sandeep Kaur Kuttal
{"title":"Developers’ information seeking in Question & Answer websites through a gender lens","authors":"Abim Sedhain , Vaishvi Diwanji , Helen Solomon , Shahnewaz Leon , Sandeep Kaur Kuttal","doi":"10.1016/j.cola.2024.101267","DOIUrl":"10.1016/j.cola.2024.101267","url":null,"abstract":"<div><p>Question & Answer websites for developers, such as Stack Overflow, contain enormous programming knowledge which can be redundant and cost substantial time and cognitive effort. We investigated the information seeking behavior of developers on Stack Overflow using Information Foraging Theory. To understand the influence of gender on foraging patterns, we conducted a gender-balanced think-aloud lab study with 12 participants, followed by retrospective interviews. The participants performed two debugging tasks: (1) understand foraging between question variants and (2) understand foraging between answer variants, on Stack Overflow. Various cues and strategies were utilized by the participants to find relevant question and optimal answer on Stack Overflow. The effect of gender on their foraging pattern was observed as male participants used 19.7% more cues and spent 55% more time than female participants. We also categorized various cues in terms of cost-value proposition and reported a debugging foraging model for Stack Overflow. Our study has implications for Question and Answer websites as well as Information Foraging Theory.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101267"},"PeriodicalIF":2.2,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"τJUpdate: An update language for time-varying JSON data","authors":"Zouhaier Brahmia , Fabio Grandi , Safa Brahmia , Rafik Bouaziz","doi":"10.1016/j.cola.2024.101258","DOIUrl":"10.1016/j.cola.2024.101258","url":null,"abstract":"<div><p><span>Time-varying JSON data are being used and exchanged in various today’s application frameworks like IoT<span><span> platforms, Web services, cloud computing, </span>online social networks<span><span><span>, and mobile systems. However, in the state-of-the-art of JSON data management, there is neither a consensual nor a standard language for updating (i.e., inserting, modifying, and deleting) temporal JSON data, like the TSQL2 or SQL:2016 languages for temporal </span>relational data. Moreover, existing JSON-based NoSQL </span>DBMSs<span><span> (e.g., MongoDB, Couchbase, CouchDB, OrientDB, and Riak) and both commercial relational DBMSs (e.g., IBM DB2 12, Oracle 19c, and MS SQL Server 2019) and open-source ones (e.g., </span>PostgreSQL 15, and MySQL 8.0) supporting JSON documents do not provide any facility for maintaining temporal JSON data. Also in our previously proposed temporal JSON framework, called </span></span></span></span><span><math><mi>τ</mi></math></span>JSchema, there was no feature for temporal JSON instance updates. For these reasons, we propose in this article a temporal update language, named <span><math><mi>τ</mi></math></span>JUpdate (Temporal JUpdate), for JSON data in the <span><math><mi>τ</mi></math></span><span>JSchema environment. We define it as a temporal extension<span> of our previously introduced non-temporal JSON update language, named JUpdate (JSON Update). Both the syntax and the operational semantics of the data modification operations of JUpdate have been extended to support temporal aspects. </span></span><span><math><mi>τ</mi></math></span>JUpdate allows to specify temporal JSON updates in an expressive and user-friendly manner, and to efficiently execute them in the <span><math><mi>τ</mi></math></span>JSchema environment.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101258"},"PeriodicalIF":2.2,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139460031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-based clustering of similar issues in GitHub’s repositories","authors":"Hamzeh Eyal Salman","doi":"10.1016/j.cola.2023.101257","DOIUrl":"10.1016/j.cola.2023.101257","url":null,"abstract":"<div><p>Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101257"},"PeriodicalIF":2.2,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139095633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudio Di Sipio, Juri Di Rocco, Davide Di Ruscio, Phuong T. Nguyen
{"title":"LEV4REC: A feature-based approach to engineering RSSEs","authors":"Claudio Di Sipio, Juri Di Rocco, Davide Di Ruscio, Phuong T. Nguyen","doi":"10.1016/j.cola.2023.101256","DOIUrl":"10.1016/j.cola.2023.101256","url":null,"abstract":"<div><p><span>To facilitate the development of recommender systems<span> for software engineering (RSSEs), this paper introduces LEV4REC, a model-driven approach supporting all RSSE development stages, from design to deployment. It enables parameter fine-tuning, enhancing the developer and </span></span>user experience by using a dedicated feature model for early configuration. We evaluated LEV4REC by applying it to two existing RSSEs based on different algorithms.</p><p>Results demonstrate its ability to recreate suitable recommendations and outperform a state-of-the-art approach. Qualitative findings from a focus group study further validate LEV4REC’s effectiveness, while indicating the need for extension points to support additional systems.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101256"},"PeriodicalIF":2.2,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138741160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A test amplification bot for Pharo/Smalltalk","authors":"Mehrdad Abdi , Henrique Rocha , Alexandre Bergel , Serge Demeyer","doi":"10.1016/j.cola.2023.101255","DOIUrl":"10.1016/j.cola.2023.101255","url":null,"abstract":"<div><p>Test amplification exploits the knowledge embedded in an existing test suite to strengthen it. A typical test amplification technique transforms the initial tests into additional test methods that increase the mutation coverage. Although past research demonstrated the benefits, additional steps need to be taken to incorporate test amplifiers in the everyday workflow of developers. This paper describes a proof-of-concept bot integrating <span>Small-Amp</span> with <span>GitHub-Actions</span>. The bot decides for itself which tests to amplify and does so within a limited time budget. To integrate the bot into the <span>GitHub-Actions</span> workflow, we incorporate three special-purpose features: (i) prioritization (to fit the process within a given time budget), (ii) sharding (to split lengthy tests into smaller chunks), and (iii) sandboxing (to make the amplifier crash-resilient). We evaluate our approach by installing the proof-of-concept extension of <span>Small-Amp</span> on five open-source projects deployed on GitHub. Our results show that a test amplification bot is feasible at a project level by integrating it into the build system. Moreover, we quantify the impact of prioritization, sharding, and sandboxing so that other test amplifiers may benefit from these special-purpose features. Our proof-of-concept demonstrates that the entry barrier for adopting test amplification can be significantly lowered.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101255"},"PeriodicalIF":2.2,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138565824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved software fault prediction using new code metrics and machine learning algorithms","authors":"Manpreet Singh, Jitender Kumar Chhabra","doi":"10.1016/j.cola.2023.101253","DOIUrl":"10.1016/j.cola.2023.101253","url":null,"abstract":"<div><p>Many code metrics exist for bug prediction. However, these metrics are based on the trivial count of code properties and are not sufficient. This research article proposes three new code metrics based on class complexity, coupling, and cohesion to fill the gap. The Promise repository metrics suite's complexity, coupling, and cohesion metrics are replaced by the proposed metrics, and a new metric suite is generated. Experiments show that the proposed metrics suite gives more than 2 % improvement in AUC and precision and approximately 1.5 % in f1-score and recall with fewer code metrics than the existing metrics suite.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101253"},"PeriodicalIF":2.2,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felicien Ihirwe , Davide Di Ruscio , Simone Gianfranceschi , Alfonso Pierantonio
{"title":"CHESSIoT: A model-driven approach for engineering multi-layered IoT systems","authors":"Felicien Ihirwe , Davide Di Ruscio , Simone Gianfranceschi , Alfonso Pierantonio","doi":"10.1016/j.cola.2023.101254","DOIUrl":"10.1016/j.cola.2023.101254","url":null,"abstract":"<div><h3>Context:</h3><p>The current technology revolution, which places the highest value on people’s welfare, is frequently seen as being mainly supported by Internet of Things (IoT) technologies. IoT is regarded as a powerful multi-layered network of systems that integrates several heterogeneous, independently networked (sub-)systems working together to achieve a shared purpose.</p></div><div><h3>Objective:</h3><p>In this article, we present CHESSIoT, a model-driven engineering environment that integrates high-level visual design languages, software development, safety analysis, and deployment approaches for engineering multi-layered IoT systems. With CHESSIoT, users may conduct different engineering tasks on system and software models under development to enable earlier decision-making and take prospective measures, all supported by a unique environment.</p></div><div><h3>Methodology:</h3><p>This is achieved through multi-staged designs, most notably the physical, functional, and deployment architectures<span>. The physical model specification is used to perform both qualitative and quantitative safety analysis by employing logical Fault-Trees models (FTs). The functional model specifies the system’s functional behavior and is later used to generate platform-specific code that can be deployed on low-level IoT device nodes. Additionally, the framework supports modeling the system’s deployment plan and run-time service provisioning, which would ultimately be transformed into deployment configuration artifacts ready for execution on remote servers.</span></p></div><div><h3>Results:</h3><p>To showcase the effectiveness of our proposed approach, as well as the capability of the supporting tool, a multi-layered Home Automation system (HAS) scenario has been developed covering all its design, development, analysis, and deployment aspects. Furthermore, we present the results from different evaluation mechanisms which include a comparative analysis and a qualitative assessment. The evaluation mechanisms target mainly completeness of CHESSIoT by addressing specific research questions.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101254"},"PeriodicalIF":2.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}