{"title":"The role of data transformation in modern analytics: A comprehensive survey","authors":"Sanae Borrohou, Rachida Fissoune, Hassan Badir","doi":"10.1016/j.cola.2025.101329","DOIUrl":"10.1016/j.cola.2025.101329","url":null,"abstract":"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101329"},"PeriodicalIF":1.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards democratisation of veterinary clinical protocols: Transferring their development from technical-coding experts to veterinary professionals for the case of Chronic Kidney Disease for Cats (CKD4Cats Domain-Specific Language)","authors":"Sofia Meacham , Hessa Alfraihi","doi":"10.1016/j.cola.2025.101328","DOIUrl":"10.1016/j.cola.2025.101328","url":null,"abstract":"<div><div>This paper presents CKD4Cats, a domain-specific language (DSL) for computerised Chronic Kidney Disease (CKD) clinical protocols in cats - a very common disease in veterinary practice. Building on DSLs used in human health, CKD4Cats addresses veterinary-specific needs while addressing their shortcomings. Developed with JetBrains’ Meta-Programming System (MPS) and veterinary input, the DSL ensures ease of use and adoption. It employs advanced evaluation methods, creating a projectional editor that streamlines protocol creation, displays relevant options, and guarantees ”correct-by-construction” clinical protocols. This innovative approach democratises software development, making advanced tools accessible to non-technical users and significantly improving veterinary practice management.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101328"},"PeriodicalIF":1.7,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144177731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel framework for evaluating developers’ code comprehension proficiency through technical and non-technical skills","authors":"Divjot Singh, Ashutosh Mishra, Ashutosh Aggarwal","doi":"10.1016/j.cola.2025.101327","DOIUrl":"10.1016/j.cola.2025.101327","url":null,"abstract":"<div><h3>Context:</h3><div>Code comprehension is an essential software maintenance skill, where technical skills are often considered the primary benchmark for evaluating developers’ proficiency, overlooking the significant role of non-technical skills.</div></div><div><h3>Objective:</h3><div>Our work aims to propose a generalized framework for measuring developers’ code comprehension proficiency by integrating technical and non-technical skills, inspired by cognitive attraction networks, and conducting an empirical study to evaluate code comprehension proficiency based on selective skills.</div></div><div><h3>Methods:</h3><div>The generalized framework evaluates developers’ technical and non-technical skills separately using collected data and computes their respective indices to derive an overall measure of code comprehension ability, represented as the comprehension measure index (CMI). Additionally, an empirical study with 158 participants assessed technical skills, including code understanding, debugging, and completion, alongside non-technical skills such as problem-solving, emotions, long-term memory, belief, desire, intention, and commitment to compute their overall code comprehension proficiency.</div></div><div><h3>Results:</h3><div>Based on the obtained indices values related to technical and non-technical parameters, the study identifies multiple factors affecting participants’ performance, including lack of technical knowledge, reliance on guesswork, stress intolerance, lack of commitment and desire, difficulty understanding logic, inability to recall concepts, and check other contributing factors. To enhance our results K-means clustering is done to group the participants into three clusters according to their performance.</div></div><div><h3>Conclusion:</h3><div>Integrating technical and non-technical skills enables a more accurate assessment by addressing factors beyond technical expertise. The framework can help managers and tutors identify strengths and weaknesses, allowing task assignments that align with strengths of developers while addressing areas for improvement.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101327"},"PeriodicalIF":1.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Ierusalimschy , Luiz Henrique de Figueiredo , Waldemar Celes
{"title":"The evolution of Lua, continued","authors":"Roberto Ierusalimschy , Luiz Henrique de Figueiredo , Waldemar Celes","doi":"10.1016/j.cola.2025.101326","DOIUrl":"10.1016/j.cola.2025.101326","url":null,"abstract":"<div><div>Lua is a scripting language created in 1993 in Brazil. We have reported in detail on the birth of Lua and its evolution until 2007. Here, we chronicle the evolution of Lua since then. In particular, we discuss in detail the evolution of global variables, the introduction of integers, and the implementation of garbage collection and finalizers, including deterministic finalization. We also comment on some landmark social developments in the history of Lua.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101326"},"PeriodicalIF":1.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Debugging in the Domain-Specific Modeling Languages for multi-agent systems","authors":"Baris Tekin Tezel , Geylani Kardas","doi":"10.1016/j.cola.2025.101325","DOIUrl":"10.1016/j.cola.2025.101325","url":null,"abstract":"<div><div>In many cases, developers face challenges while implementing Multi-Agent Systems (MAS) due to the complexity of expanding software systems, despite the presence of numerous agent programming environments and platforms. To tackle this complexity, Model-driven Engineering (MDE) can be employed at a higher level of abstraction and component modeling before diving into MAS development, which helps alleviate the intricacies. Probably, the most effective method of incorporating MDE into Multi-Agent Systems (MAS) is to adapt Domain-Specific Modeling Languages (DSMLs) along with integrated development environments (IDEs). These tools make it easier to model the system and generate the necessary code for the development process. Although existing MAS DSML IDEs offer some control over systems modeled based on the language’s syntax and semantics, they lack built-in debugging support. This deficiency leads to uncertainty among agent developers about the accuracy of models prepared during the design phase. To address this issue, this study proposes a comprehensive debugging framework (MASDebugFW) that facilitates the design of agent components within modeling environments. The framework’s utilization commences with modeling MASs using a design language, and then converting these design model instances into a runtime model. Following that, the runtime model undergoes simulation using an integrated simulator specifically designed for debugging purposes. Additionally, the framework includes a simulation environment model and a control mechanism to manage the simulation process effectively. These features further enhance the debugging capabilities and overall functionality of MASDebugFW. Furthermore, we have qualitatively and quantitatively evaluated MASDebugFW, subjecting all obtained results to statistical analysis. The evaluation results show that, on average, the implemented framework reduces debugging time by around 45%, leading to more efficient debugging processes. Moreover, it significantly enhances bug detection and repair capabilities, as it increases the number of bugs fixed in the models by approximately 50%.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101325"},"PeriodicalIF":1.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
André Rauber Du Bois, Gerson Geraldo H. Cavalheiro
{"title":"GPotion: Embedding GPU programming in Elixir","authors":"André Rauber Du Bois, Gerson Geraldo H. Cavalheiro","doi":"10.1016/j.cola.2025.101323","DOIUrl":"10.1016/j.cola.2025.101323","url":null,"abstract":"<div><div>This paper describes GPotion, a DSL for GPU programming embedded in the Elixir functional language. GPotion allows programmers to write low-level GPU kernels, similar to CUDA kernels, in Elixir but also provides high-level facilities, like garbage collection of host and device arrays allocated in the host, type inference and simplified data transfer. This paper describes the design and implementation of GPotion and also presents experiments that demonstrate that GPotion allows fast and efficient kernels with little overhead in comparison to pure CUDA. GPotion is implemented using metaprogramming features of Elixir, without having to modify Elixir’s compiler. The source code for GPotion and the benchmarks used in the experiments are available in a GitHub repository.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101323"},"PeriodicalIF":1.7,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near-Pruned single assignment transformation of programs","authors":"Akshay M. Fajge, Raju Halder","doi":"10.1016/j.cola.2025.101324","DOIUrl":"10.1016/j.cola.2025.101324","url":null,"abstract":"<div><div>This paper introduces <span>Near-Pruned</span> <span>SSA</span>, a novel variant of the <span>SSA</span> form that attains precision close to the <span>Pruned</span> version while prioritizing its efficient generation without the need for costly data flow analysis. This is realized by leveraging variables’ usage information within the program’s <em>augmented</em> <span>CFG</span>. Furthermore, we propose a direct method for generating <span>DSA</span> form of programs that bypasses the traditional process of <span><math><mi>ϕ</mi></math></span>-node destruction into its immediate predecessor-blocks, thereby streamlining the process. Experimental evaluation on a range of <em>Solidity</em> programs, including <em>real-world</em> smart contracts deployed on the <em>Ethereum mainnet</em>, demonstrates that our method outperforms existing <span>SSA</span> variants, except for the <span>Pruned</span> version, by minimizing the number of introduced <span><math><mi>ϕ</mi></math></span>-statements compared to <em>state-of-the-art</em> techniques. In particular, the proposed <span>Near-Pruned</span> variant demonstrates a computational cost that is approximately one-third of that of the <span>Pruned</span> variant while achieving a nearly 92% reduction in the introduction of additional statements compared to the <span>Semi-Pruned</span> variant.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101324"},"PeriodicalIF":1.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics","authors":"Lov Kumar , Vikram Singh , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra","doi":"10.1016/j.cola.2025.101322","DOIUrl":"10.1016/j.cola.2025.101322","url":null,"abstract":"<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental result","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101322"},"PeriodicalIF":1.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Code histories: Documenting development by recording code influences and changes in code","authors":"Vo Thien Tri Pham, Caitlin Kelleher","doi":"10.1016/j.cola.2024.101313","DOIUrl":"10.1016/j.cola.2024.101313","url":null,"abstract":"<div><div>Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101313"},"PeriodicalIF":1.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed
{"title":"A comprehensive meta-analysis of efficiency and effectiveness in the detection community","authors":"Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed","doi":"10.1016/j.cola.2024.101314","DOIUrl":"10.1016/j.cola.2024.101314","url":null,"abstract":"<div><div>Creating an intrusion detection system (IDS) is a prominent area of research that continuously draws attention from both scholars and practitioners who tirelessly innovate new solutions. The complexity of IDS naturally escalates alongside technological advancements, whether they are manually implemented within security infrastructures or elaborated upon in academic literature. However, accessing and comparing these IDS solutions requires sifting through a multitude of hypotheses presented in research papers, which is a laborious and error-prone endeavor. Consequently, many researchers encounter difficulties in replicating results or reanalyzing published IDSs. This challenge primarily arises due to the absence of a standardized process for elucidating IDS methodologies. In response, this paper advocates for a framework aimed at enhancing the reproducibility of IDS outcomes, thereby enabling their seamless reuse across diverse cybersecurity contexts, benefiting both end-users and experts alike. The proposed framework introduces a descriptive language for the precise specification of IDS descriptions. Additionally, a model repository facilitates the sharing and reusability of IDS configurations. Lastly, through a case study, we showcase the effectiveness of our framework in addressing challenges associated with data acquisition and knowledge organization and sharing. Our results demonstrate satisfactory prediction accuracy for configuration reuse and precise identification of reusable components.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101314"},"PeriodicalIF":1.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}