{"title":"An efficient instance selection algorithm for fast training of support vector machine for cross-project software defect prediction pairs","authors":"Manpreet Singh, Jitender Kumar Chhabra","doi":"10.1016/j.cola.2024.101301","DOIUrl":"10.1016/j.cola.2024.101301","url":null,"abstract":"<div><div>SVM is limited in its use for cross-project software defect prediction because of its very slow training process. So, this research article proposes a new instance selection (IS) algorithm called boundary detection among classes (BDAC) to reduce the training dataset size for faster training of SVM without degrading the prediction performance. The proposed algorithm is evaluated against six existing IS algorithms based on accuracy, running time, data reduction rate, etc. using 23 general datasets, 18 software defect prediction datasets, and two shape-based datasets, and results prove that BDAC is better than the selected algorithm based on collective comparison.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101301"},"PeriodicalIF":1.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira
{"title":"Detection and treatment of string events in the limit","authors":"Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2024.101299","DOIUrl":"10.1016/j.cola.2024.101299","url":null,"abstract":"<div><div>A string event is a pattern that occurs in a stream of characters. The need to detect and handle string events in infinite texts emerges in many scenarios, including online treatment of logs, web crawling, and syntax highlighting. This paper describes a technique to specify and treat string events. Users determine patterns of interest via a markup language. From such examples, tokens are generalized via a semi-lattice of regular expressions. Such tokens are combined into a context-free language that recognizes patterns in the text stream. These techniques are implemented in a text processing system called <span>Lushu</span>, which runs on the Java Virtual Machine (JVM). <span>Lushu</span> intercepts strings emitted by the JVM. Once patterns are detected, it invokes a user-specified action handler. As a proof of concept, this paper shows that <span>Lushu</span> outperforms state-of-the-art parsers and parser generators, such as <span>Comby</span>, <span>BeautifulSoup4</span> and <span>ZheFuscator</span>, in terms of memory consumption and running time.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101299"},"PeriodicalIF":1.7,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ClangOz: Parallel constant evaluation of C++ map and reduce operations","authors":"Paul Keir , Andrew Gozillon","doi":"10.1016/j.cola.2024.101298","DOIUrl":"10.1016/j.cola.2024.101298","url":null,"abstract":"<div><div>Interest in metaprogramming, reflection, and compile-time evaluation continues to inspire and foster innovation among the users and designers of the C++ programming language. Regrettably, the impact on compile-times of such features can be significant; and outside of build systems, multi-core parallelism is unable to bring down compilation times of individual translation units. We present ClangOz, a novel Clang-based research compiler that addresses this issue by evaluating annotated constant expressions in parallel, thereby reducing compilation times. Prior benchmarks analyzed parallel map operations, but were unable to consider reduction operations. Thus we also introduce parallel reduction functionality, alongside two additional benchmark programs.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101298"},"PeriodicalIF":1.7,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MoTion: A new declarative object matching approach in Pharo","authors":"Aless Hosry , Vincent Aranega , Nicolas Anquetil","doi":"10.1016/j.cola.2024.101290","DOIUrl":"10.1016/j.cola.2024.101290","url":null,"abstract":"<div><p>Pattern matching is an expressive way of matching data and extracting pieces of information from it. The recent inclusion of pattern matching in the Java and Python languages highlights that such a facility is more and more adopted by developers for everyday development. Other main stream programming languages also offer pattern matching capabilities as part of the language (Rust, Scala, Haskell, and OCaml), with different degrees of expressivity in what can be matched. In the meantime, in graphs, pattern matching takes a slightly different turn; it enhances the expressivity of the patterns that can be defined. Smalltalk currently offers little pattern matching capability to find specific objects inside a large graph of objects using a declarative pattern. In Pharo, the closest library to classical pattern matching that exists is the <span>RBParseTreeSearcher</span>, which allows to express specialized patterns over a Pharo Abstract Syntax Tree to find some inner node. The question arises of what features a flexible pattern matching language should have. In this paper, we review the features found in different existing pattern matching languages, both in General Purpose Languages (like Java) and in declarative graph pattern matching languages. We then describe MoTion, a new pattern matching engine for Pharo smalltalk, combining all these features. We discuss some aspects of MoTion’s implementation and illustrate its use with real case examples.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101290"},"PeriodicalIF":1.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhou Tian , Yuchen Gong , Chenhao Chang , Jiaze Sun , Yanping Chen , Lingwei Chen
{"title":"An empirical study on divergence of differently-sourced LLVM IRs","authors":"Zhenzhou Tian , Yuchen Gong , Chenhao Chang , Jiaze Sun , Yanping Chen , Lingwei Chen","doi":"10.1016/j.cola.2024.101289","DOIUrl":"10.1016/j.cola.2024.101289","url":null,"abstract":"<div><p>In solving binary code similarity detection, many approaches choose to operate on certain unified intermediate representations (IRs), such as Low Level Virtual Machine (LLVM) IR, to overcome the cross-architecture analysis challenge induced by the significant morphological and syntactic gaps across the diverse instruction set architectures (ISAs). However, the LLVM IRs of the same program can be affected by diverse factors, such as the acquisition source, i.e., compiled from source code or disassembled and lifted from binary code. While the impact of compilation settings on binary code has been explored, the specific differences between LLVM IRs from varied sources remain underexamined. To this end, we pioneer an in-depth empirical study to assess the discrepancies in LLVM IRs derived from different sources. Correspondingly, an extensive dataset containing nearly 98 million LLVM IR instructions distributed in 808,431 functions is curated with respect to these potential IR-influential factors. On this basis, three types of code metrics detailing the syntactic, structural, and semantic aspects of the IR samples are devised and leveraged to assess the divergence of the IRs across different origins. The findings offer insights into how and to what extent the various factors affect the IRs, providing valuable guidance for assembling a training corpus aimed at developing robust LLVM IR-oriented pre-training models, as well as facilitating relevant program analysis studies that operate on the LLVM IRs.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101289"},"PeriodicalIF":1.7,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault localization by abstract interpretation and its applications","authors":"Aleksandar S. Dimovski","doi":"10.1016/j.cola.2024.101288","DOIUrl":"10.1016/j.cola.2024.101288","url":null,"abstract":"<div><p><em>Fault localization</em> aims to automatically identify the cause of an error in a program by localizing the error to a relatively small part of the program. In this paper, we present a novel technique for automated fault localization via <em>error invariants</em> inferred by abstract interpretation. An error invariant for a location in an error program over-approximates the reachable states at the given location that may produce the error, if the execution of the program is continued from that location. Error invariants can be used for <em>statement-wise semantic slicing</em> of error programs and for obtaining concise error explanations. We use an iterative refinement sequence of backward–forward static analyses by abstract interpretation to compute error invariants, which are designed to explain why an error program violates a particular assertion.</p><p>Furthermore, we present a practical application of the fault localization technique for automatic repair of programs. Given an erroneous program, we first use the fault localization to automatically identify statements relevant for the error, and then repeatedly mutate the expressions in those relevant statements until a correct program that satisfies all assertions is found. All other statements classified by the fault localization as irrelevant for the error are not mutated in the program repair process. This way, we significantly reduce the search space of mutated programs without losing any potentially correct program, and so locate a repaired program much faster than a program repair without fault localization.</p><p>We have developed a prototype tool for automatic fault localization and repair of C programs. We demonstrate the effectiveness of our approach to localize errors in realistic C programs, and to subsequently repair them. Moreover, we show that our approach based on combining fault localization and code mutations is significantly faster that the previous program repair approach without fault localization.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"80 ","pages":"Article 101288"},"PeriodicalIF":1.7,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Screening articles for systematic reviews with ChatGPT","authors":"Eugene Syriani , Istvan David , Gauransh Kumar","doi":"10.1016/j.cola.2024.101287","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101287","url":null,"abstract":"<div><p>Systematic reviews (SRs) provide valuable evidence for guiding new research directions. However, the manual effort involved in selecting articles for inclusion in an SR is error-prone and time-consuming. While screening articles has traditionally been considered challenging to automate, the advent of large language models offers new possibilities. In this paper, we discuss the effect of using ChatGPT on the SR process. In particular, we investigate the effectiveness of different prompt strategies for automating the article screening process using five real SR datasets. Our results show that ChatGPT can reach up to 82% accuracy. The best performing prompts specify exclusion criteria and avoid negative shots. However, prompts should be adapted to different corpus characteristics.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"80 ","pages":"Article 101287"},"PeriodicalIF":1.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590118424000303/pdfft?md5=88fb1aa235050a4011046d39a856044b&pid=1-s2.0-S2590118424000303-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ask or tell: An empirical study on modeling challenges from LabVIEW community","authors":"Xin Zhao , Gurshan Rai , Saheed Popoola","doi":"10.1016/j.cola.2024.101284","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101284","url":null,"abstract":"<div><p>Systems engineering is fundamental to the performance, functionality, and value of modern systems and products. Systems model development and the usage of modeling tools in systems engineering can be complex. Systems engineering is concerned not only with the modeling process but many other aspects of development, such as hardware, data, and modeling tools. Because systems modeling practitioners are versatile and have expertise in specifying, building, maintaining, and supporting technical infrastructure, the modeling practices and challenges vary. We conducted an empirical study from a systems modeler’s perspective to better understand the challenges posed to systems engineers in the LabVIEW community. Our inspection consists of two investigations. First, with the help of machine learning techniques, we mined online discussion forum posts to identify what questions engineers <em>ask</em> regarding modeling challenges. The result shows that the most challenging part is related to coding practice in the development. Inspired by this discovery, we conducted another empirical study. We surveyed systems engineers in the LabVIEW community using an online questionnaire to recognize what system engineers <em>tell</em> regarding coding challenges in their development practice. Our paper also provided some observations and concrete suggestions to both systems engineers and tool developers on how to improve system model quality and facilitate the modeling process when using LabVIEW.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"80 ","pages":"Article 101284"},"PeriodicalIF":1.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre Laborde , Yann Le Goff , Éric Le Pors , Alain Plantec , Steven Costiou
{"title":"Live application programming in the defense industry with the Molecule component framework","authors":"Pierre Laborde , Yann Le Goff , Éric Le Pors , Alain Plantec , Steven Costiou","doi":"10.1016/j.cola.2024.101286","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101286","url":null,"abstract":"<div><p>At Thales Defense Mission Systems (DMS), software products first go through an industrial prototyping phase. Prototypes are serious applications that we evaluate with our end-users during demonstrations. End-users have a central role in the design process of our products. They often ask for software modifications during demonstrations to experiment new ideas or to focus the existing design on their needs.</p><p>In this paper, we present how we combined Smalltalk’s live-programming capabilities with software component models to obtain flexible and modular software designs in our context of live prototyping. We present Molecule, an open-source implementation of the Lightweight CORBA Component Model in Pharo. We use Molecule to build HMI systems prototypes, and we benefit from the dynamic run-time modification capabilities of Pharo during demonstrations with our end-users where we explore software designs in a lively way.</p><p>Molecule is an industrial contribution to Smalltalk, as it capitalizes 20 years of usage and maturation in our prototyping activity. The Molecule framework and tools are now mature, and we started building end-user software used in production at Thales DMS. We present two such end-user software and analyze their component architecture, that are representative of how we (learnt to) build HMI prototypes. Finally, we analyze our technological decisions with regards to the benefits we sought for our industrial activity.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"80 ","pages":"Article 101286"},"PeriodicalIF":1.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141480844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valentin Bourcier , Steven Costiou , Maximilian Ignacio Willembrinck Santander , Adrien Vanègue , Anne Etien
{"title":"Time-traveling object-centric breakpoints","authors":"Valentin Bourcier , Steven Costiou , Maximilian Ignacio Willembrinck Santander , Adrien Vanègue , Anne Etien","doi":"10.1016/j.cola.2024.101285","DOIUrl":"https://doi.org/10.1016/j.cola.2024.101285","url":null,"abstract":"<div><p>Object-centric breakpoints aim to facilitate the debugging of object-oriented programs by focusing on specific objects. However, their practical application faces limitations. They often produce false positives and require developers to identify objects to debug in a running program, which is sometimes not possible due to non-determinism. Additionally, object-centric breakpoints are difficult to build because, to the best of our knowledge, their implementations have never been abstracted from low-level concerns. The literature describes complex reflective architectures necessary for implementing these breakpoints, and their rare available implementations are language-specific.</p><p>In this paper, we introduce <em>Time-Traveling Object-Centric Breakpoints (TTOCBs)</em>, a new definition and implementation of object-centric breakpoints based on <em>Time-Traveling Queries (TTQs)</em>. TTQs are an extensible time-traveling debugging system that allows developers to explore their program executions back and forth by executing debugging queries. We argue that our query-based implementation helps to overcome the limitations of traditional object-centric breakpoints. We describe how TTOCBs assist developers in searching for objects to debug within their program executions, even in the presence of non-determinism. We illustrate how existing object-centric breakpoints from the literature can be implemented and how new ones can be created in a few steps using the TTQ abstractions and scripting API. To build breakpoints, developers need to familiarize themselves with a short API instead of learning language reflection techniques and libraries. This makes our TTOCBs independent of the underlying TTQs and debugger implementations.</p><p>To evaluate our solution, we conducted an initial anecdotal user study on four example scenarios, providing evidence that debugging with TTOCBs requires fewer actions than with traditional object-centric breakpoints. We then discuss the comparison between object-centric breakpoints and TTOCBs in terms of applicability and performance.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"80 ","pages":"Article 101285"},"PeriodicalIF":1.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}