{"title":"TestLoter: A logic-driven framework for automated unit test generation and error repair using large language models","authors":"Ruofan Yang, Xianghua Xu, Ran Wang","doi":"10.1016/j.cola.2025.101348","DOIUrl":"10.1016/j.cola.2025.101348","url":null,"abstract":"<div><div>Automated unit test generation is a critical technique for improving software quality and development efficiency. However, traditional methods often produce test cases with poor business consistency, while large language model based approaches face two major challenges: a high error rate in generated tests and insufficient code coverage. To address these issues, this paper proposes TestLoter, a logic-driven test generation framework. The core contributions of TestLoter are twofold. First, by integrating the structured analysis capabilities of white-box testing with the functional validation characteristics of black-box testing, we design a logic-driven test generation chain-of-thought that enables deep semantic analysis of code. Second, we establish a hierarchical repair mechanism to systematically correct errors in generated test cases, significantly enhancing the correctness of the test code. Experimental results on nine open-source projects covering various domains, such as data processing and utility libraries, demonstrate that TestLoter achieves 83.6% line coverage and 78% branch coverage. Our approach outperforms both LLM-based methods and traditional search-based software testing techniques in terms of coverage, while also reducing the number of errors in the generated unit test code.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101348"},"PeriodicalIF":1.8,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Fontes Sumitani, Lucas Victor da Silva Costa, Frederico F. Campos, Fernando Magno Quintão Pereira
{"title":"A methodology for empirical complexity analysis based on Newton’s polynomial interpolation","authors":"Rafael Fontes Sumitani, Lucas Victor da Silva Costa, Frederico F. Campos, Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2025.101347","DOIUrl":"10.1016/j.cola.2025.101347","url":null,"abstract":"<div><div>A cost model is a function that relates how often each part of a program runs depending on its inputs. Cost models can be derived automatically via the observation of counters: instrumentation that tracks execution of program operations. This paper defines Newton Counters: counters that can be described via a polynomial ranging on a single program input variable whose value can be read in constant time. Additionally, it shows that Newton Counters are prevalent in actual codes. Motivated by this observation, the paper introduces a methodology to derive automatic cost models. Said methodology combines static code analyses with interpolation via Newton’s divided difference method. This approach is currently available as a tool, <span>Merlin</span>. The effectiveness of this tool is demonstrated on 949 executable C programs taken from the <span>Jotai</span> collection, and on <span>genann</span>, a neural network library.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101347"},"PeriodicalIF":1.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144703220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mind the gap: The missing features of the tools to support user studies in software engineering","authors":"Lázaro Costa , Susana Barbosa , Jácome Cunha","doi":"10.1016/j.cola.2025.101345","DOIUrl":"10.1016/j.cola.2025.101345","url":null,"abstract":"<div><div>User studies are paramount for advancing research in software engineering, particularly when evaluating tools and techniques involving programmers. However, researchers face several barriers when performing them despite the existence of supporting tools. We base our study on a set of tools and researcher-reported barriers identified in prior work on user studies in software engineering. In this work, we study how existing tools and their features cope with previously identified barriers. Moreover, we propose new features for the barriers that lack support. We validated our proposal with 102 researchers, achieving statistically significant positive support for all but one feature. We study the current gap between tools and barriers, using features as the bridge. We show there is a significant lack of support for several barriers, as some have no single tool to support them.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101345"},"PeriodicalIF":1.7,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced LPeg techniques: A dual case study approach","authors":"Zixuan Zhu","doi":"10.1016/j.cola.2025.101343","DOIUrl":"10.1016/j.cola.2025.101343","url":null,"abstract":"<div><div>This paper presents advanced optimization techniques for Lua Parsing Expression Grammars (LPeg) through two complementary case studies: a high-performance JSON parser and a sophisticated Glob-to-LPeg pattern converter. We demonstrate how strategic grammar construction can dramatically improve parsing performance without modifying the underlying LPeg library. For the JSON parser, we implement substitution capture and table construction optimization to reduce memory allocation overhead and improve object processing. For the Glob converter, we introduce segment-boundary separation, implement Cox’s flattened search strategy, and develop optimized braced condition handling to prevent exponential backtracking. Comprehensive benchmarks demonstrate that our JSON parser achieves processing speeds up to 125 MB/s on complex documents, consistently outperforming dkjson and showing competitive results against rxi_json across most test cases. Our Glob-to-LPeg converter exhibits 14%–92% better performance than Bun.Glob and runs 3–14 times faster than Minimatch across diverse pattern matching scenarios. This research provides practical optimization techniques for LPeg-based parsers, contributing valuable strategies to the text processing ecosystem.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101343"},"PeriodicalIF":1.7,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144501341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting learners in the transition from block-based to text-based programming, a systematic review","authors":"Glenn Strong, Nina Bresnihan, Brendan Tangney","doi":"10.1016/j.cola.2025.101342","DOIUrl":"10.1016/j.cola.2025.101342","url":null,"abstract":"<div><div>This paper describes a systematic review of the approaches being taken to providing support to learners as they transition from block-based programming environments to text-based ones. It identifies and analyses the literature in the area, identifies the themes which are common across the different approaches being used, and determines gaps in the literature. With the widespread use of block-based programming environments in introductory programming education, the question of how to support learners in the transition to text-based environments has received much attention. The contribution of this paper is to analyse and characterise the approaches being taken to support learners by considering the question: what approaches have been developed to facilitate the transition from block-based programming to text-based programming for learners? To answer this, a systematic literature review was undertaken, combining manual and automatic searches to identify work in the field. A thematic analysis of the literature found eight themes covering technical and non-technical approaches to supporting transition, prompting a set of recommendations for gaps to be addressed in future development in the field.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101342"},"PeriodicalIF":1.7,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sérgio Queiroz de Medeiros, Marcelo Borges Nogueira, Gustavo Quezado
{"title":"Investigating the energy consumption of C++ and Java solutions mined from a programming contest site","authors":"Sérgio Queiroz de Medeiros, Marcelo Borges Nogueira, Gustavo Quezado","doi":"10.1016/j.cola.2025.101341","DOIUrl":"10.1016/j.cola.2025.101341","url":null,"abstract":"<div><div>The current concern about global warming has led to an increasing interest in the energy efficiency of computer applications. Assuming power is constant, the general trend is that faster programs consume less energy, thus optimizing a program for speed would also improve its energy efficiency.</div><div>We investigate this tendency in a set of C++ and Java solutions mined from Code Submission Evaluation System (CSES), a popular programming competition site, where each solution must give the correct answer under a given time limit. In such context, we can consider that all correct solutions for a problem were written with a speed concern, but not with energy efficiency in mind.</div><div>We selected 15 problems from CSES and for each of them we mined at least 30 C++ and Java solutions, evaluating time and energy efficiency of each solution in at least two different machines. In our scenario, where there is a great diversity of programming styles, execution speed, and memory usage, we could confirm the general trend: faster programs consume less energy. Moreover, we were able to use ordinary least squares to fit a linear function, with good precision, that relates energy consumption of a program to its execution time, as well as to automatically identify programs with abnormal energy consumption. A manual analysis of these programs revealed that often they perform a different amount of allocation and deallocation operations when compared to programs with similar execution times.</div><div>We also calculated the energy consumption profile of sets of random C++ solutions for these 15 CSES problems, and we tried to associate each set with its corresponding CSES problem by using the energy consumption profiles previously computed for each one of them. By using this approach, we could restrict, for each set of random C++ solutions, the classification task to a subset of 7 CSES problems, a reduction of more than 50% in the search space.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101341"},"PeriodicalIF":1.7,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144308119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang
{"title":"Python’s evolution on Stack Overflow: An empirical analysis of topic trends","authors":"Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang","doi":"10.1016/j.cola.2025.101340","DOIUrl":"10.1016/j.cola.2025.101340","url":null,"abstract":"<div><div>With the rapid development of information technology and changing programming practices, the demand for programming discussions on online Q&A platforms is growing. This study analyzes over two million Python-related posts on Stack Overflow to identify core topics and challenges over fifteen years. By using a Gradient Boosting Decision Tree (GBDT) model to quantify post popularity, we objectively show what the hottest as well as the most disturbing topics related to Python are to users at different times. We find that: The domains most closely associated with Python are data processing and machine learning, while development environments as well as automation and testing are gradually increasing in popularity. Machine learning is the area that bothers users the most. Moreover, we found that some questions that confuse users can increase the popularity of related topics. These findings can help developers grasp the direction of the Python language so that they can better plan their personal learning and project development. Enterprises and organizations can also optimize resource allocation based on trends in hot topics for training, tool development, and technical support.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101340"},"PeriodicalIF":1.7,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of data transformation in modern analytics: A comprehensive survey","authors":"Sanae Borrohou, Rachida Fissoune, Hassan Badir","doi":"10.1016/j.cola.2025.101329","DOIUrl":"10.1016/j.cola.2025.101329","url":null,"abstract":"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101329"},"PeriodicalIF":1.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards democratisation of veterinary clinical protocols: Transferring their development from technical-coding experts to veterinary professionals for the case of Chronic Kidney Disease for Cats (CKD4Cats Domain-Specific Language)","authors":"Sofia Meacham , Hessa Alfraihi","doi":"10.1016/j.cola.2025.101328","DOIUrl":"10.1016/j.cola.2025.101328","url":null,"abstract":"<div><div>This paper presents CKD4Cats, a domain-specific language (DSL) for computerised Chronic Kidney Disease (CKD) clinical protocols in cats - a very common disease in veterinary practice. Building on DSLs used in human health, CKD4Cats addresses veterinary-specific needs while addressing their shortcomings. Developed with JetBrains’ Meta-Programming System (MPS) and veterinary input, the DSL ensures ease of use and adoption. It employs advanced evaluation methods, creating a projectional editor that streamlines protocol creation, displays relevant options, and guarantees ”correct-by-construction” clinical protocols. This innovative approach democratises software development, making advanced tools accessible to non-technical users and significantly improving veterinary practice management.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101328"},"PeriodicalIF":1.7,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144177731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel framework for evaluating developers’ code comprehension proficiency through technical and non-technical skills","authors":"Divjot Singh, Ashutosh Mishra, Ashutosh Aggarwal","doi":"10.1016/j.cola.2025.101327","DOIUrl":"10.1016/j.cola.2025.101327","url":null,"abstract":"<div><h3>Context:</h3><div>Code comprehension is an essential software maintenance skill, where technical skills are often considered the primary benchmark for evaluating developers’ proficiency, overlooking the significant role of non-technical skills.</div></div><div><h3>Objective:</h3><div>Our work aims to propose a generalized framework for measuring developers’ code comprehension proficiency by integrating technical and non-technical skills, inspired by cognitive attraction networks, and conducting an empirical study to evaluate code comprehension proficiency based on selective skills.</div></div><div><h3>Methods:</h3><div>The generalized framework evaluates developers’ technical and non-technical skills separately using collected data and computes their respective indices to derive an overall measure of code comprehension ability, represented as the comprehension measure index (CMI). Additionally, an empirical study with 158 participants assessed technical skills, including code understanding, debugging, and completion, alongside non-technical skills such as problem-solving, emotions, long-term memory, belief, desire, intention, and commitment to compute their overall code comprehension proficiency.</div></div><div><h3>Results:</h3><div>Based on the obtained indices values related to technical and non-technical parameters, the study identifies multiple factors affecting participants’ performance, including lack of technical knowledge, reliance on guesswork, stress intolerance, lack of commitment and desire, difficulty understanding logic, inability to recall concepts, and check other contributing factors. To enhance our results K-means clustering is done to group the participants into three clusters according to their performance.</div></div><div><h3>Conclusion:</h3><div>Integrating technical and non-technical skills enables a more accurate assessment by addressing factors beyond technical expertise. The framework can help managers and tutors identify strengths and weaknesses, allowing task assignments that align with strengths of developers while addressing areas for improvement.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101327"},"PeriodicalIF":1.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}